Validating women’s reports of antenatal and postnatal care received in Bangladesh, Cambodia and Kenya

Background Global indicators for monitoring progress in maternal and newborn health have tended to rely on contact coverage indicators rather than the content of services received. As part of the effort to improve measurement of progress in maternal and newborn health, this study examines how accurately women can report on information and health interventions received during an antenatal or postnatal health consultation at health facilities in Bangladesh, Cambodia and Kenya. Methods We conducted secondary analysis of matched observation and client interview data to compare women’s reports of care received at exit interview with observation by a trained third-party observer. We assessed indicator accuracy by calculating sensitivity, specificity, area under the receiver operating characteristic curve (AUC) and inflation factor (IF). Indicators considered to have both high individual accuracy (an AUC value of 0.70 or greater) and low population-level bias (0.75<IF<1.25) were considered to have acceptable validity. In addition, we considered the number of countries where both validation criteria were met. Results For indicators of antenatal care, we found 16 of 18 indicators in Bangladesh, 3 of 6 in Cambodia and 3 of 8 in Kenya met both validation criteria. For postnatal care, we found evidence of acceptable validity for 6 of 8 indicators in Bangladesh, 5 of 14 in Cambodia and 3 of 16 in Kenya. In general, we documented higher validity for indicators related to concrete, observable actions, as opposed to information or advice given. Women were more likely to recall care received for themselves, rather than for their newborn. Conclusions Women reported accurately on multiple aspects of antenatal and postnatal care. While we describe broad patterns in the types of indicators likely to be recalled with accuracy, differences by setting warrant further investigation. Findings inform efforts to better monitor the coverage and quality of maternal and newborn health interventions.

► Self-reported data obtained in population-based household surveys are frequently used to determine global and national coverage for maternal and newborn health interventions (ie, the proportion of individuals in need of an intervention who receive it). ► Prior validation studies have suggested that, with few exceptions, women are not able to report accurately on care received in the intrapartum or immediate postnatal period (up to 1 hour following birth). ► Few studies have examined how accurately women can report on the content of care received during an antenatal or postnatal health consultation for themselves or their newborn.
What are the new findings?
► Validation analyses of women's immediate recall of facility-based care across three countries suggest that women can accurately report on several interventions in the antenatal and postnatal periods; however, there were differences by setting. ► Women were more likely to report accurately on concrete, observable interventions as opposed to information or advice given.

What do the new findings imply?
► In the context of calls for enhanced measurement of the components that lead to effective coverage, findings suggest that careful consideration of the type of information women are asked to recall is needed. ► As new indicators are proposed, they should be subject to validity tests using variations in wording, recall period and setting.
AbsTrACT background Global indicators for monitoring progress in maternal and newborn health have tended to rely on contact coverage indicators rather than the content of services received. As part of the effort to improve measurement of progress in maternal and newborn health, this study examines how accurately women can report on information and health interventions received during an antenatal or postnatal health consultation at health facilities in Bangladesh, Cambodia and Kenya.
Methods We conducted secondary analysis of matched observation and client interview data to compare women's reports of care received at exit interview with observation by a trained third-party observer. We assessed indicator accuracy by calculating sensitivity, specificity, area under the receiver operating characteristic curve (AUC) and inflation factor (IF). Indicators considered to have both high individual accuracy (an AUC value of 0.70 or greater) and low population-level bias (0.75<IF<1.25) were considered to have acceptable validity. In addition, we considered the number of countries where both validation criteria were met. results For indicators of antenatal care, we found 16 of 18 indicators in Bangladesh, 3 of 6 in Cambodia and 3 of 8 in Kenya met both validation criteria. For postnatal care, we found evidence of acceptable validity for 6 of 8 indicators in Bangladesh, 5 of 14 in Cambodia and 3 of 16 in Kenya. In general, we documented higher validity for indicators related to concrete, observable actions, as opposed to information or advice given. Women were more likely to recall care received for themselves, rather than for their newborn.
Conclusions Women reported accurately on multiple aspects of antenatal and postnatal care. While we describe broad patterns in the types of indicators likely to be recalled with accuracy, differences by setting warrant further investigation. Findings inform efforts to better monitor the coverage and quality of maternal and newborn health interventions.

InTroduCTIon
Service contact indicators such as attending antenatal care (ANC) within the first 14 weeks of gestation, delivering at a health facility and receipt of postnatal care (PNC) in the first 2 days of birth have been widely used to track progress towards national and international health goals. 1 In many low-income and middle-income countries (LMICs), household surveys are the best or only available data on the coverage of maternal and newborn services. 2 Household survey programmes such as the Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS) collect information from women via a BMJ Global Health series of questions in a face-to-face interview. Yet there are measurement challenges associated with these indicators.
A growing body of literature highlights the wide discrepancies between indicators of service contact and those that measure receipt of high-quality services and associated benefits in maternal and newborn care. [3][4][5][6][7] For example, an analysis of DHS data from 41 countries found that, among women reporting four or more ANC visits for a birth in the preceding 2 years, a substantial percentage of women did not receive the recommended services, with a gap between expected and actual coverage ranging from 18% to 86% across countries. 6 To highlight such gaps, a coverage cascade framework that identifies key losses in service quality has been proposed to measure effective coverage. 3 Similarly, updated WHO guidelines for maternal and newborn care emphasise the quality and provision of appropriate and timely interventions (quality-adjusted coverage) in addition to service contact indicators. 8 9 Further, several global strategies such as the Global Strategy for Women's, Children's and Adolescents' Health, 10 Every Newborn Action Plan, 11 and Ending Preventable Maternal Mortality, in addition to the Sustainable Development Goal agenda, 12 have set renewed targets to improve maternal and newborn health by 2030. These initiatives also include indicators (either in use or under development) that directly relate to the quantity and quality of care of women and newborns. 1 Despite this, few questions in surveys such as MICS and DHS reflect women's receipt of specific health interventions. Most of the questions relate to antenatal and intrapartum interventions. ANC service coverage interventions currently measured in DHS, for example, include an ANC visit in the first trimester, measurement of blood pressure, tetanus toxoid vaccination, urine testing, counselling about danger signs, HIV counselling and testing, iron-folate supplementation, and malaria prevention. In contrast, despite accounting for more maternal deaths than any other phase of pregnancy, no PNC interventions for the mother and few for the newborn are currently tracked in the DHS or MICS. More recently, there have been efforts to change this omission. For example, an optional module on pregnancy and PNC was recently added to the DHS. Further, several aspects of newborn PNC have been recently added to the core DHS-7 questionnaires. These include whether within the first 2 days of birth the provider examined the cord, measured the infant's temperature, counselled the mother on danger signs for newborns and counselled/observed the mother breast feeding.
While the expansion in routinely tracked maternal and newborn indicators is encouraging, data availability must be weighed against the accuracy of what is measured. A growing, although limited, body of research has sought to assess the validity of women's recall of maternal and newborn health interventions by comparing women's reports with a 'gold standard' measure. Recent research has used observations by a trained observer as the reference standard. [13][14][15][16][17][18] Taken together, these studies suggest that women are not likely to accurately recall interventions received during the intrapartum or immediate postnatal period (within an hour of birth). 13-15 17 A study of similar design that assessed women's immediate recall following a return PNC health visit in Kenya and eSwatini suggests that the accuracy of women's reports is greater for select postnatal indicators relative to intrapartum and immediate postnatal indicators, although there were some differences by setting. 16 A study conducted in rural China compared women's recall of ANC and PNC with medical records rather than observer report. 19 To the best of our knowledge, this is also the only study to date to have also examined the validity of reporting of interventions received as part of routine ANC, which were generally found to be recalled with poor specificity. 19 Given the renewed focus on measurement of qualityadjusted coverage, as well as the limited number of studies that have sought to assess self-reported antenatal or postnatal care interventions, additional validation work is warranted. The present study aims to extend research findings to date by assessing the validity of a set of antenatal and postnatal service coverage indicators that reflect a range of recommended interventions and counselling procedures across three different settings.

MeTHods
We compared women's reports of antenatal or postnatal care received against observations by a trained third-party observer using a structured checklist in health facilities located in Bangladesh, Cambodia and Kenya. Women's reports of care received were collected via exit interview prior to them leaving the health facility following a health visit for themselves or their newborn. Data were originally collected as part of an evaluation of a voucher and accreditation intervention led by the Population Council in Bangladesh, Cambodia and Kenya, with the National Institute of Public Health in Cambodia and Research Training and Management (RTM) International in Bangladesh as partners. The primary objective of the evaluation was to assess the influence of the voucher programme (henceforth 'Voucher programme') on maternal and newborn health service utilisation, including antenatal and postnatal care. A secondary objective was to determine whether the voucher programme improved service quality by verifying service delivery through reimbursements to providers. Country-specific study protocols that detail data collection processes have been previously published. [20][21][22] As the study relied on extant data, indicators were not identified a priori. Survey questionnaires were reviewed for indicators which were reported by both the observers and the women for validation purposes. Online supplementary table 1 displays which indicators align with global and national development goals, as well as those which have the potential to be included in such efforts. Patients or the public were not involved in the BMJ Global Health design, or conduct, or reporting, or dissemination plans of our research. In each setting, approximately half of the facilities were accredited to provide maternal and newborn health and family planning services to women holding vouchers for the services (ie, voucher intervention facilities). Voucher facilities were compared with a sample of non-accredited (control) facilities from the same/similar districts. To reduce the potential for selection bias related to facility enrolment, pairwise matching was also used to match facilities on factors hypothesised to influence provider behaviour a priori, including profile of clientele, location and fees charged, type of practice, and skills mix. 21 22 In total, 3169 women were interviewed and observed for ANC (n=1036 in Bangladesh, 957 in Cambodia and 1176 in Kenya) and 2462 for PNC (n=208 in Bangladesh, 635 in Cambodia and 1619 in Kenya).
In Bangladesh, all health facilities were government upazila health complexes, located in 22 upazilas (subdistricts) from six divisions: Barisal, Chittagong, Dhaka, Khulna, Rajshahi and Sylhet. The majority (77%) of facilities offered comprehensive obstetric care, while about a quarter offered basic obstetric care (23%). More than half of facilities had a skilled provider for caesarean delivery and six had capacity for blood transfusion. Out of the 22 facilities, 17 provided referral to a district hospital or maternal and child welfare centre. Nationally, 29% of deliveries to women took place in a health facility in the 3 years preceding the 2011 DHS, 26% of pregnant women received four or more antenatal visits, and 28% of women received postnatal health check in the first 2 days of birth, while more than nearly-two thirds of mothers received no PNC (61%). 23 In Cambodia, 40 public/government health facilities were located in 5 provinces (Kampong Thom, Kampot Speu, Prey Veng, Kampot, and Takeo). All but two health facilities were health centres; two were former district hospitals. Most facilities (68%) had a single bed. Half of facilities were located 15 km or less from the nearest referral hospital, while 35% were located at a distance of 10 km or less. At the national level, 54% of women with a birth in the preceding 5 years delivered in a health facility, 60% of pregnant women reported receiving four or more ANC visits, and 70% of women who gave birth in the 5 years before the interview received PNC within the first 2 days of delivery. 24 Of the mothers, 26% received no PNC. Of the provinces where data collection took place, PNC was notably lower than the national average: reports of no PNC ranged from 17% in Kampot/Kep to 53% in Oddor Mean Chey. 24 In Kenya, 62 facilities were located in Kisumu, Kiambu, Kitui counties, and the Korogocho and Viwandani informal settlements in Nairobi. Facilities were either public (n=40), private-for-profit (n=10), faith-based (n=9) or non-governmental organisation (n=3). The majority of facilities were hospitals (61%), followed by health centres (31%) and nursing homes, dispensaries or clinics (8%). The national prevalence of facility delivery in Kenya was 43% at the time of the most recent DHS survey that preceded data collection (2008-2009), 47% of pregnant women reported receiving four or more ANC visits, and 42% of women who gave birth in the 5 years preceding the survey received postnatal check-up within 2 days of birth, while 53% received no PNC. 25 data management In Cambodia and Kenya, unique identification codes for the client exit interview and observation record of received care were matched. In Bangladesh, identification codes were generated by combining information on facility, date, type of service received and end time of observation/start time of interview. If there was ambiguity (a time lag of more than 45 min or more than one feasible match), the case was excluded. The process resulted in 1244 matches out of 2228 cases (56% match) (1036 cases for ANC and 208 for PNC). We performed a sensitivity analysis by restricting the maximum time difference to 30 min and expanding it to 60 min, with minimal impact on sample size (<50 case difference). Given the conservative approach used, we are confident in the accuracy of the matching process. Data for each cross-sectional year of data collection were pooled for each country. Questions about whether interventions occurred were coded 1 if the response was 'Yes' and 0 if 'No'.

sample size
We anticipated indicator prevalence would range between 50% and 80% coverage because assessed indicators were health-promoting rather than harmful practices. We assumed levels of moderate to high sensitivity (60%-70%) and specificity (70%-80%), given recall was immediate. The sample size for anticipated sensitivity and specificity levels was calculated using Buderer's formula. 26 We set α=0.05 for both accuracy parameters assuming a normal approximation to a binomial distribution. Based on these specifications, a sample size of 400 women per country is sufficient to estimate 60% sensitivity and 70% specificity with at least 7% precision.

Validation analysis
We constructed two-by-two tables and calculated sensitivity (the true positive rate) and specificity (true negative rate) for each indicator. A greater number of indicators were assessed in Bangladesh due to differences in questionnaires, which allowed more aspects of care to be matched between observer and client reports. 'Don't Know' responses were excluded in validity estimates but BMJ Global Health are reported in tables 1 and 2. We estimated the area under the receiver operating characteristic curve (AUC) and corresponding 95% CI following a binomial distribution. The AUC can be interpreted as 'the average sensitivity across all possible specificities'. 27 An AUC of 0.5 indicates an uninformative test and an AUC of 1.0 represents perfect accuracy (100% sensitivity and 100% specificity). 27 An AUC value of 0.7 or higher was used as the cut-off criteria for high individual-level reporting accuracy. 28 To assess population-level validity for each indicator, we calculated the degree to which an indicator would be overestimated or underestimated in a household survey using the inflation factor (IF). Specifically, the IF is the ratio of the indicator's estimated population-based survey prevalence to the indicator's 'true' (observed) prevalence. To estimate the population-based survey prevalence (Pr), we applied the indicator's estimated sensitivity (SE) and specificity (SP) to its true observed prevalence (P), using the following equation: Pr=P×(SE+SP-1)+(1-SP). 28 We used an IF cut-off between 0.75 and 1.25 as the benchmark for low population-level bias. 28 Indicators based on a small number of true (observed) positive or true negative cases which resulted in estimated precision for SE or SP of 15 percentage points or more are reported in the data tables, but not discussed in the text. The summary measures AUC and IF are also suppressed for these indicators due to a high degree of uncertainty around the estimate. All analyses were performed using R Studio V.1.1.383 (RStudio, Boston, Massachusetts).

resulTs sample description
The characteristics of women who attended an antenatal or postnatal consultation for themselves or their newborn are presented in table 3. Women's age ranged from 18 to 52 years, with a median age of 24 years (IQR: 21-28) for antenatal clients and a median age of 25 (IQR: [22][23][24][25][26][27][28][29][30] for postnatal clients. For antenatal clients, approximately half of women in Cambodia and Kenya were attending their first visit for their current pregnancy, while 65% of women were attending their first visit in Bangladesh. Bangladesh and Kenya ANC clients were most likely to be in their third trimester (50% and 49%), whereas 27% of women in Cambodia were in their third trimester. More than one-third of women had one prior pregnancy in all countries (40% in Bangladesh, 36% in Cambodia and 34% in Kenya). Of the women, 15% in Bangladesh, 11% in Cambodia and 5% in Kenya had no school or less than primary school as their highest educational attainment.
For postnatal clients, 13% of women in Bangladesh, 15% in Cambodia and 22% in Kenya had four or more prior births. The age of the infant was less than 2 weeks for 19% of the sample in Bangladesh, 58% in Cambodia and 37% in Kenya. The proportion of women whose highest education was less than primary school completion was similar to that of ANC clients (21% Bangladesh, 12% Cambodia and 6% Kenya).

ANC service coverage indicators
For ANC we assessed 18 indicators in Bangladesh, 5 in Cambodia and 8 in Kenya (table 1). Out of the three countries, the greatest accuracy was observed in Bangladesh: 16 of 18 indicators met both validation criteria. A similar proportion of ANC indicators in Kenya (3 of 8) and Cambodia (3 of 6) met both validation criteria. Across ANC indicators, responses of 'Don't Know' were minimal (<1%) (table 1).
Two indicators were assessed in all three countries: whether the mother was screened for anaemia and whether the fetal heart rate was checked. Both indicators met both validation criteria in two of the three countries. Of the eight indicators assessed in two countries, four indicators-whether the woman's blood pressure was checked, an abdominal examination was performed, a urine screen was performed and whether a nurse/ midwife attended the woman during the consultationmet both validation criteria in both countries where they were tested. Whether the woman's weight was measured and whether a doctor or medical resident attended the consultation met both criteria in one of two countries. Two indicators of ANC counselling-whether the woman was counselled on the status of her pregnancy and whether the provider gave her a date to return for carehad lower SP (ranging between 21.5% and 54.5% across countries) and did not meet the AUC in either of the two countries where they were tested.
Of 10 indicators unique to Bangladesh, 8 met both validation criteria. Of these indicators, the lowest SE (55.9, 95% CI 49.5 to 62.0) was observed for whether the woman was referred or received ultrasonogram. The lowest SP was observed for whether the woman was informed on possible pregnancy-related complications (69.5, 95% CI 66.4 to 72.6). In contrast, the highest SE (98.0, 95% CI 96.6 to 98.9) and SP (98.0, 95% CI 96.7 to 98.9) were observed for whether a family welfare visitor attended the ANC consultation. This indicator has relevance for both DHS and MICS as the type(s) of provider(s) who attended the consultation is routinely tracked. Notably, whether the woman received a blood test during ANC also had relatively high SE (71.6, 95% CI 66.4 to 76.3) and SP (98.0, 95% CI 96.7 to 98.9). This indicator is currently tracked in both the DHS and MICS core questionnaires.   Similar to ANC, indicators tested in Bangladesh had the highest accuracy (6 of 8), followed by 5 of 14 in Cambodia and 3 of 16 in Kenya. The two PNC indicators which did not meet both validation criteria in Bangladesh were whether the provider discussed maternal danger signs for the mother (SE: 36.7, 95% CI 23.4 to 51.7; SP: 89.9, 95% CI 84.2 to 94.1) and whether the provider gave information on baby sickness signs (SE: 31.0, 95% 22.1 to 41.0; SP: 97.0, 95% CI 91.6 to 99.4). Of the five indicators which met both validation criteria in Cambodia, four related to maternal, rather than newborn, care. These were whether the provider checked the mother's blood pressure, performed a breast examination, conducted an abdominal examination or checked for anaemia. The one validated item pertaining to newborn care was whether the provider discussed breast feeding or feeding for the baby with the mother. Similarly, two of the three indicators in Kenya which met both validation criteria pertained to aspects of the maternal consultation: whether the provider checked for excessive bleeding or discussed family planning with the mother. The one item found to be of acceptable SE and SP in Kenya regarding newborn care was whether the provider immunised the baby.

dIsCussIon
This study extends prior validation research related to maternal and newborn care by assessing the validity of ANC and PNC service coverage indicators 16 using data from three additional LMIC settings. Our results make evident that, at discharge, women are able to report with accuracy on multiple aspects of antenatal and postnatal care, particularly in contrast to the very few intrapartum and immediate postnatal (within 1 hour of birth) indicators, that have met validation criteria. [13][14][15][16][17] However, study findings also demonstrate considerable variability by survey question and setting. Despite this heterogeneity, we identified several broad patterns by type of indicator.
The first is a general pattern of higher reporting accuracy for indicators related to concrete, observable interventions. For example, most ANC indicators which met both accuracy criteria in at least two countries reflected health checks during a physical examination of the mother (blood pressure check, abdominal examination, anaemia screening, urine test and fetal heart rate monitoring). The same trend was observed for PNC indicators. Indicators which reflected physical examination of the mother were more likely than indicators of counselling interventions, provider type or newborn care to meet both validation criteria in half or more of settings tested.    In contrast, ANC and PNC indicators that reflected more abstract concepts, particularly those pertaining to counselling or advice given, performed less reliably. Neither recalling whether the woman was counselled on the status of her pregnancy, given a return date in ANC, nor whether the provider gave information on sexually transmitted infections (STIs) and HIV/AIDS, discussed danger signs for the mother or gave information on the baby's sickness signs in PNC met both validation criteria in any of the countries tested. For nearly all these indicators, the AUC fell below the benchmark of 0.70, with the exception of being given information on STI and HIV/AIDS. Notably, at the time of data collection in Kenya facilities, there was emphasis on provider-initiated HIV counselling and testing which may have enhanced the quality of counselling provision. These results raise questions around how counselling is conducted and how well women understand and retain the information they are given. Also, what makes counselling memorable to women?

BMJ Global Health
An analysis of DHS service provision assessment data from Haiti, Malawi and Senegal collected between 2012 and 2014 found that the strongest predictor of client ANC knowledge was when client and observer reports agreed that counselling on a given topic had been performed. 29 This study also found that client and observer agreement that counselling had taken place was generally low and suggested that poor-quality counselling was a factor in client acknowledgement that counselling had occurred. Two exceptions to the relatively low validity of counselling-type indicators documented in this study were counselling related to family planning and whether the provider discussed breast feeding or infant feeding with the mother, which met both validation criteria in two countries, respectively. Notably, these two indicators had higher SE than other counselling-type indicators. It is possible that in these cases counselling was paired with an observable action such as breastfeeding demonstration or being shown or receiving a family planning method.
The more accurate recall of physical aspects of care, as opposed to advice or information given, has mixed support in prior validation studies. A PNC recall study of similar design in Kenya and eSwatini found that five of the same six indicators of maternal physical examination as measured in this study met both validation criteria in at least one of the two countries. 16 However, the Kenya and eSwatini study also found somewhat better recall of counselling indicators. Two counselling indicators (whether the provider discussed danger signs for the mother or gave information on STIs and HIV) met both validation criteria in either Kenya or eSwatini, whereas neither of these indicators met the benchmark in any setting of the present study. Another study in China which compared women's reports with facility records found generally lower SP for indicators of the maternal physical examination during ANC and PNC than the present and only prior known PNC validation study. 19 The China study did not include counselling-type indicators. Notably, women included in the survey had delivered at least one live birth in the country in the 5 years preceding the survey and therefore had a substantially longer recall period.
Another notable trend observed was that indicators related to care for the mother herself were more accurately recalled than indicators of care for the newborn in PNC. Across indicators, SP was on average lower for newborn PNC interventions relative to maternal PNC interventions, demonstrating that women had a tendency to over-report newborn interventions received. The same trend for lower SP was observed in a prior validation study conducted in Kenya and eSwatini for newborn postnatal interventions. 16 Overall, a similar proportion of newborn indicators met both validation criteria in at least one country in the present study (five of six), as in the McCarthy et al study 16 conducted in Kenya and eSwatini (four of five). Three indicators-whether the provider discussed breast feeding/infant feeding with the mother, weighed the baby or immunised the baby-had acceptable validity in at least one setting in each of the studies.
While broad trends in the types of indicators accurately recalled are apparent, the substantial variation across samples raises questions about what characteristics of the respondent or setting influence recall accuracy. For example, for both ANC and PNC, indicators tended to have the highest SE and SP in Bangladesh, while these were lower in Cambodia and Kenya. These findings lead to questions as to whether differences in respondent characteristics contribute to differences in recall accuracy. For example, are women with a greater number of prior births primed to recall interventions received because they are more knowledgeable about the standard of care? Or does the expectation of type of care that should be received (eg, in a higher-tier facility or in a setting where an intervention is near universally vs rarely practised) matter more? It is also possible that respondent expectations of care based on facility attributes, random variation or differences in data training protocols account for discrepancies by setting. Such questions warrant further investigation and highlight the need for validation research in additional settings.
A strength of this study is the inclusion of a large number and range of type of facilities in each setting. Greater variation in facility practices addresses sample size limitations due to lack of variation in facility practices in prior validation research. 13 14 16 Examination of validation results in three different countries also gives insight into how individual validity may vary by setting and which indicators tended to perform most consistently. However, this study also has several limitations. While direct observation by a third-party observer is considered to be gold standard, it may also be imperfect. Differences in observer training protocols, facility practices or how apparent it was that a given intervention was implemented, among other factors, may contribute to differences in observer ratings across settings. For example, observation of counselling interventions may have been more subjective regarding whether counselling took place. This may have contributed to lower SE and SP for counselling-type indicators. Finally, summary accuracy criteria should be interpreted with care. Global measures of test accuracy fail to distinguish between false negative and false positive test errors. We also caution against generalising the populationbased results of this study to other settings, depending on the prevalence of the intervention. We have previously demonstrated how indicator properties established in this study can be extended to contexts with varying levels of intervention coverage. 13 An important consideration of the relevance of study findings for national and global monitoring efforts is that this study assessed women's immediate recall accuracy (at facility discharge). Results may not be directly generalisable to the DHS and MICS, which typically ask women to recall events related to a birth in the 2-5 years prior. While immediate recall may represent best-case scenario in terms of accuracy, findings inform the degree to which women BMJ Global Health perceived specific interventions took place. Prior evidence suggests that, unless interventions are recalled with high accuracy at facility discharge, recall generally declines with time. For example, validation analysis of facility-based interventions received in the intrapartum and immediate postnatal periods in Kenya showed that the few select interventions which were recalled with high accuracy at facility discharge maintained acceptable accuracy at 13-15 months' follow-up. 15 However, for most indicators, recall accuracy was poor and either remained the same or declined with time. Another study which assessed maternal recall of infant birth weight among women in Nepal at recall periods ranging from 1 to 24 months postdelivery found that recall was generally poor and relatively uninfluenced by length of follow-up. 30 While additional research evaluating different lengths of recall time is warranted, it is possible that high immediate recall is necessary in order to 'code' certain events into memory for later reporting.
In the context of calls for enhanced measurement of the components that lead to effective coverage, study findings such as these suggest that careful consideration of the type of information women are asked to recall is needed. While household survey programmes such as the DHS and MICS are frequently relied on as data sources for measuring intervention coverage, findings should be triangulated with other data sources such as routine data from health information systems. As new indicators are proposed, they should be subject to validity tests in a range of settings and recall periods. Variations in question wording and sequence should be systematically tested to investigate whether validity can be improved.

ConClusIon
This study extends the scant evidence base on aspects of antenatal and postnatal interventions that women can accurately recall and report on in household surveys. In contrast to prior validation studies of intrapartum and immediate PNC (within 1 hour of birth), we find women are able to recall with accuracy some aspects of antenatal and routine PNC. While we note some trends in reporting accuracy, such as generally more accurate recall for indicators related to observable (eg, maternal physical examination) rather than counselling (eg, discussion of STIs or HIV) interventions, considerable variability in results by survey question and setting is also evident.