Article Text

Development of measures for assessing mistreatment of women during facility-based childbirth based on labour observations
  1. Blair O Berger1,
  2. Donna M Strobino1,
  3. Hedieh Mehrtash2,
  4. Meghan A Bohren3,
  5. Kwame Adu-Bonsaffoh4,
  6. Hannah H Leslie5,
  7. Theresa Azonima Irinyenikan6,
  8. Thae Maung Maung7,
  9. Mamadou Dioulde Balde8,
  10. Özge Tunçalp9
  1. 1 Population, Family and Reproductive Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
  2. 2 Department of Sexual and Reproductive Health and Research, including UNDP/UNFPA/UNICEF/WHO/World Bank Special Programme of Research, Development and Research Training in Human Reproduction (HRP), World Health Organization, Geneve, Switzerland
  3. 3 Centre for Health Equity, University of Melbourne School of Population and Global Health, Melbourne, Victoria, Australia
  4. 4 Department of Obstetrics and Gynaecology, School of Medicine and Dentistry, University of Ghana, Accra, Greater Accra, Ghana
  5. 5 Division of Prevention Science, University of California, San Francisco, San Francisco, California, USA
  6. 6 Obstetrics and Gynaecology, University of Medical Sciences Teaching Hospital, Akure, Ondo, Nigeria
  7. 7 Department of Medical Research, Ministry of Health and Sports, Yangon, Myanmar
  8. 8 Cellulle de Recherche en Sante de la Reproduction en Guinee (CERREGUI), University National Hospital-Donka, Conakry, Guinea
  9. 9 UNDP/UNFPA/ UNICEF/WHO/World Bank Special Programme of Research, Development and Research Training in Human Reproduction (HRP), Department of Sexual and Reproductive Health and Research, World Health Organization, Geneve, Switzerland
  1. Correspondence to Dr Blair O Berger; blair.berger{at}


Introduction Mistreatment of women during childbirth is increasingly recognised as a significant issue globally. Research and programmatic efforts targeting this phenomenon have been limited by a lack of validated measurement tools. This study aimed to develop a set of concise, valid and reliable multidimensional measures for mistreatment using labour observations applicable across multiple settings.

Methods Data from continuous labour observations of 1974 women in Nigeria (n=407), Ghana (n=912) and Guinea (n=655) were used from the cross-sectional WHO’s multicountry study ‘How women are treated during facility-based childbirth’ (2016–2018). Exploratory factor analysis was conducted to develop a scale measuring interpersonal abuse. Two indexes were developed through a modified Organisation for Economic Co-operation and Development approach for generating composite indexes. Measures were evaluated for performance, validity and internal reliability.

Results Three mistreatment measures were developed: a 7-item Interpersonal Abuse Scale, a 3-item Exams & Procedures Index and a 12-item Unsupportive Birth Environment Index. Factor analysis results showed a consistent unidimensional factor structure for the Interpersonal Abuse Scale in all three countries based on factor loadings and interitem correlations, indicating good structural construct validity. The scale had a reliability coefficient of 0.71 in Nigeria and approached 0.60 in Ghana and Guinea. Low correlations (Spearman correlation range: −0.06–0.19; p≥0.05) between mistreatment measures supported our decision to develop three separate measures. Predictive criterion validation yielded mixed results across countries. Both items within measures and measure scores were internally consistent across countries; each item co-occurred with other items in a measure, and scores consistently distinguished between ‘high’ and ‘low’ mistreatment levels.

Conclusion The set of concise, comprehensive multidimensional measures of mistreatment can be used in future research and quality improvement initiatives targeting mistreatment to quantify burden, identify risk factors and determine its impact on health and well-being outcomes. Further validation and reliability testing of the measures in other contexts is needed.

  • maternal health
  • obstetrics
  • public health
  • cross-sectional survey

Data availability statement

Data are available on request. The analytical study dataset from the 'How women are treated during facility-based childbirth' WHO study is deidentified and, archived through WHO/HRP’s electronic recordmanagement system. Data requests with an expression of interest in pursuing multicountrysecondary analyses with a specific research question can be made to Moreinformation about the study tools are available here: and theprimary publication from the study here:

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key questions

What is already known?

  • There is growing recognition that mistreatment of women during facility-based childbirth is a widespread issue impacting women’s experiences of care, and it is a deterrent to future care seeking.

  • There remains a lack of consensus on operational definitions of measures and constructs of mistreatment, as well as validated measures.

What are the new findings?

  • Using data from continuous labour observations of women in Nigeria, Ghana and Guinea from the WHO multicountry study ‘How women are treated during facility-based childbirth’” we developed a set of three concise mistreatment measures: a 7-item Interpersonal Abuse Scale, a 3-item Exams & Procedures Index and a 12-item Unsupportive Birth Environment Index.

  • The three mistreatment measures showed strong preliminary construct validity and internal consistency.

  • Further validation and reliability testing of the measures in other contexts is needed.

What do the new findings imply?

  • Validated measures are essential to quantify the scale of the mistreatment as a global issue, identify risk factors and determine its impact on health outcomes.

  • Our study presents a set of three short measures of mistreatment that can be used as internally consistent measures with adequate preliminary validity to assess multiple dimensions of mistreatment during childbirth.

  • These measures can be adapted and used in future research and quality improvement initiatives to quantify the burden, frequency and overlap of multiple types of mistreatment in a standardised way that can be compared across studies, settings and time periods.


There have been significant declines globally in maternal morbidity and mortality following strategies that have targeted resource availability, logistics and access, and strengthening maternity care systems on the premise that if women reach facilities, many barriers to improving maternal health outcomes would be overcome.1–3 However, previous research has underscored that high coverage of essential technical interventions alone does not yield expected improvements in maternal health; rather, a focus on measuring and improving quality of care, including interpersonal aspects and experiences of care, are critical.4–6 A recent Lancet Global Health Commission on High-Quality Health Systems concluded that poor-quality care has become a larger barrier to reducing mortality than inadequate access, and most health system measures fail to capture metrics that are important to people, including user experience, confidence in the health system and health outcomes.7–9

There is growing recognition globally that a key determinant for women seeking facility-based obstetric care is how they are treated during childbirth. Several terms have been used to describe poor care and abusive, disrespectful, negligent or discriminatory treatment of women giving birth in facilities (hereafter referred to in short as ‘mistreatment’).1 10–12 These terms have been framed as a subset of the larger issues of violence against women, human rights violations, quality of care, health systems issues or a combination of these.13 Bowser and Hill’s landmark landscape analysis was the first to review existing evidence and convene an expert working group to develop a classification system for ‘disrespect and abuse during facility-based childbirth14 ’. A global mixed-methods systematic review of mistreatment by Bohren et al. (2015) cited widespread mistreatment in 34 countries and resulted in the first evidence-based typology of mistreatment (the WHO typology), which included domains of physical abuse, verbal abuse, sexual abuse, stigma and discrimination, failure to meet professional standards of care, poor rapport and communication between women and providers, and health systems conditions and constraints.12

Recent prevalence estimates of mistreatment range widely across studies from 13% to 98%.11 15–29 Definitions of mistreatment, measurement tools, study designs, data collection modes and study samples vary significantly, posing challenges in comparing both the findings and the validity of the measurement approaches. Mistreatment is often measured as a binary outcome of experiencing at least one kind of mistreatment, particularly when risk factors are assessed.17 19 30–35 This single binary indicator captures a wide range of behaviours and events included in instruments to characterise mistreatment, thus limiting the ability to distinguish different types of mistreatment and to assess overlap and co-occurrence of these experiences.36–40 There has also been growing momentum to develop tools that capture women’s experiences during childbirth. Two scales were recently developed to measure perceptions of respectful maternity care and person-centred care.22 41–43 However, the focus of the scales is broader than specifically capturing forms of mistreatment as they seek to situate respectful maternity care as a core feature of person-centred care.22 41–43

There is also debate over a standard mode of assessment for mistreatment, with previous studies using direct labour observations, facility exit interviews or women’s reports via community surveys.11 16 28 44–46 Both labour observations and women’s reports offer important perspectives that contribute to understanding the forms and magnitude of mistreatment from unique vantage points. Two recent studies assessing discordance between self-report and labour observation reports of mistreatment in Kenya and India found that observers reported significantly higher mistreatment than women, with the authors acknowledging a complex array of social norms, expectations of care, power dynamics and recall issues during birth that may contribute to women under-reporting mistreatment.44 46

Quantitative studies have been critical in providing preliminary estimates of the burden of mistreatment across contexts. However, the evidence base underpinning them has been hampered by lack of consensus on operational definitions and constructs of mistreatment, as well as validated measures. Validated measures are essential to quantify the scale of the problem, identify risk factors and determine its impact on health outcomes. This analysis aimed to develop a set of concise, valid and reliable measures for mistreatment of women during facility-based childbirth in the following dimensions: interpersonal abuse, inappropriate conduct of exams and procedures, and unsupportive birth environment, using labour observation data of women in three countries (Nigeria, Ghana and Guinea).


Patient and public involvement

A technical consultation with representatives from advocacy groups, non-governmental organisations, research organisations, universities, professional associations and United Nations agencies was held at the WHO in November 2013, and informed the research questions and design of survey instruments in the WHO study from which data for this analysis were obtained. Women who recently gave birth were involved in exploratory formative research informing tool development, content validity testing and providing feedback on the validity testing of the community survey tool.47–54

Data source and study participants

This secondary data analysis used data from the cross-sectional WHO multicountry study ‘How women are treated during facility-based childbirth,’ conducted in two phases between 2014 and 2018 in four purposively selected countries, Ghana, Guinea, Nigeria and Myanmar. The first study phase was involved a mixed-methods systematic review and primary qualitative research to develop two measurement instruments for mistreatment, a direct labour observation tool and a postpartum community survey tool. The second study phase implemented and validated the instruments. Details of the WHO study methodology are described elsewhere.10 47 48 Data for this analysis were obtained from the labour observation tool from phase 2 of the study conducted in Nigeria, Ghana and Guinea (labour observations were not conducted in Myanmar).

A total of 2016 women were observed during childbirth in three purposively-selected public, urban, secondary-level or higher facilities in each country. Observations occurred in Nigeria from September 2016 to February 2017 and in Ghana and Guinea from August 2017 to February 2018. This analysis included 1974 women with complete data on the mistreatment items (407 women in Nigeria, 912 in Ghana and 655 women in Guinea; 42 women were excluded due to missing data (n=1 in Nigeria, n=14 in Ghana and n=27 in Guinea)). All pregnant women entering study facilities without visible signs of distress or obstetric emergency were approached and screened for eligibility on admission. Women were eligible if they were in active labour and admitted to the facility for childbirth, were at least 15 years of age and provided written consent. Participating women were continuously observed by a trained, non-clinical female observer from admission to 2 hours postpartum. Observers completed a structured instrument collecting information on maternal sociodemographic characteristics and reproductive history, use of medical interventions, maternal and neonatal outcomes, and all incidents of mistreatment between women, providers and facility staff. Observers recorded four aspects of a mistreatment incident: form of mistreatment, the time it occurred, whether it occurred during the intrapartum or postpartum period, and who committed it; a separate incident report was recorded for each incident in in the event of repeated occurrences of the same type of mistreatment.47

Development and preliminary validation of measurement tools

The WHO typology of mistreatment served as the basis for items included in the WHO survey instruments and the item pool for the mistreatment measures developed in this analysis.12 Final item development, including face and content validation of the study instruments occurred through expert review by a group of global maternal health experts to determine the relevance of each item to the desired construct and suggestions for additional items where needed. The labour observation instrument was then piloted with a convenience sample of women in a single study site in Nigeria.47

Dimensions of mistreatment

We use the term ‘mistreatment’ for the incidents represented in the measures, as this terminology underscores that mistreatment can be intentional or unintentional, and occurs at two levels: (1) the interpersonal level between provider and patient and (2) the systems level through failures of the heath facility and health system.12 We posit that the WHO typology contains domains of mistreatment that do not share a single common underlying factor. We thus operationalised mistreatment as capturing a spectrum of behaviours and experiences. The latent construct underlying physical abuse, verbal abuse and stigma and discrimination is related to interpersonal abuse within a more general violence framework, while the latent construct underlying failures to meet professional standards of care, poor rapport between women and providers, and health systems conditions and constraints is intrinsically tied to mistreatment in the process of care within a broader quality of care framework.

Statistical methods

All analyses were conducted using Stata V.15.0.55 More detailed information on the analytical approach is provided in online supplemental file.

Supplemental material

Item construction and identifying dimensions

A preliminary pool of 56 binary items in the labour observation tool was used to develop the mistreatment measures, based on five domains of the WHO typology. Sexual abuse was not captured in the WHO study instruments. We did not include stigma and discrimination because preliminary data evaluation indicated a uniformly low frequency of items in this domain across the three country samples.

Extensive exploratory data analysis on the three country samples involved an inductive approach to assess item distributions and to determine how to best operationalise dimensions of mistreatment in the measures. We iteratively assessed potential combinations of items based on conceptual consistency, practicality and understandability across study countries in an effort to maintain face and content validity. All mistreatment items were constructed as binary (‘0=no mistreatment’ and ‘1=mistreatment’). Where applicable, items were reverse coded to be consistent with this structure. Responses on ‘not applicable’ or ‘don’t know’ categories were coded as 0, rather than omitting this data, to give the most conservative estimate of the mistreatment measures.

We developed and assessed the validity and reliability of a new set of concise measures covering three dimensions of mistreatment, identified on both empirical and conceptual bases: interpersonal abuse, inappropriate conduct of exams and procedures (referred to hereafter as ‘exams & procedures’) and unsupportive birth environment. While we initially planned to develop a single psychometric scale or a set of subscales for mistreatment, based on a series of tests (Bartlett’s test of sphericity and Kaiser-Meyer-Olin test for sampling adequacy), factorability was only established for items in the interpersonal abuse dimension. The final set of scored measures for mistreatment thus included one psychometric scale measuring interpersonal abuse (a more reflexive indicator where items are theoretically homogenous and highly correlated), and two indexes measuring exams & procedures and unsupportive birth environment (formative indicators where items have different underlying causes, no assumption of homogeneity, and each item contributes uniquely to the level of a measured construct).56 57

Interpersonal Abuse Scale development: psychometric analysis

The Interpersonal Abuse Scale was developed using a psychometric approach outlined by DeVellis and Netemeyer et al.58 59 Exploratory factor analysis (EFA) was conducted separately by country to assess the construct validity and reliability of the scale.58 59 Given the early developmental stages of operationalising the Interpersonal Abuse Scale as a dimension of mistreatment, confirmatory factor analysis (CFA) was yet not warranted because CFA tests a strong theoretical a priori conceptualisation and factor structure of a construct.37 38 58 59 A principal components analysis (PCA) was conducted on tetrachoric correlation matrices to determine the number of common factors to extract for factor analysis based on four criteria: Kaiser’s rule of retaining eigenvalues>1.0, the ‘bend’ in the scree plot (plot of eigenvalues), proportion of variance explained by the factors and parallel analysis tests.58–60 EFA was then performed using iterated principal factor estimation. To establish unidimensionality, items with either low (λ<0.40) or very high (λ>0.90, indicating possible redundancy) factor loadings were omitted.58 59

Two types of construct validity during psychometric analyses were assessed in three ways: (1) standardised loadings of >0.40 on a single dominant factor, coupled with, (2) high interitem correlations provided evidence of structural construct validity and (3) a consistent factor structure across multiple country samples further signalled structural and cross-cultural construct validity/measurement invariance.60–62 Corrected item-to-total correlations, interitem correlations and the Kuder-Richardson-20 (KR-20) coefficient (the corollary to Cronbach’s alpha used for scales based on dichotomous items) were used to assess scale reliability.37 38 58 59 A range of 0.15–0.50 corrected item-to-total and interitem correlations was used as evidence of adequate internal consistency, which has been posited as an acceptable range for new measures in early stages of scale development.58–60

An approach informed by clinimetrics was used to develop the Exams & Procedures Index and an Unsupportive Birth Environment Index in the three settings using a method adapted from the Organisation for Economic Co-operation and Development (OECD) procedure for composite index development,63 the indexing approach outlined in the Consensus-based Standards for selection of health Measurement Instruments initiative (COSMIN),61 62 and existing indices measuring quality of care in maternal and reproductive health.64–68 The Exams & Procedures Index was constructed using the denominator of women who received at least one vaginal examination and/or at least one procedure (caesarean, episiotomy, hysterectomy, tubal ligation or postpartum intrauterine device (IUD) insertion). Items occurring in >90% of observations in all three countries were omitted because they would limit the ability for the measures to discriminate between levels of mistreatment, and they would artificially inflate index scores. Infrequently occurring items (<5%) in all three settings were considered for deletion but were ultimately retained to improve content validity and err on the side of overinclusion based on expert feedback and their limited effect on scoring; a primary goal in indexing is to include enough items to fully cover the content of the measured dimension since formative items define the latent construct.69 70

Items in all three mistreatment measures were scored as ‘1’ if there was a least one report of the incident, and ‘0’, otherwise, where higher scores indicate higher presence of different types of mistreatment. Scores were aggregated separately by measure using simple summative scoring without weighting. The range of scores by measure was: Interpersonal Abuse Scale (0–7), Exams & Procedures Index (0–3) and Unsupportive Birth Environment Index (0–12).

Measure validation and reliability assessment

Construct validation of the three measures was assessed via Spearman-rank correlations between mistreatment measures, using the benchmark of correlations<0.30 (p≥0.05) as evidence of significant correlation.59 The ability to establish concurrent criterion validity was limited by the lack of a ‘gold-standard’ measure of mistreatment, as is commonly the case with social and psychological constructs.58 Predictive criterion validity was evaluated through bivariate logistic regressions of a global measure of satisfaction with care and intentions to give birth in the same facility in the future (based on linked data of the same women’s responses on the WHO community survey tool administered up to 8 weeks postpartum) on ordinal mistreatment measure scores (see online supplemental appendix 1 for details).

Internal consistency of items within each measure was assessed through the percent distributions of co-occurring mistreatment items; the proportion of items that occur with at least one or several other items in a measure tends to be high in internally consistent measures. To determine consistency of measure scores (i.e., whether they can consistently distinguish between ‘high’ and ‘low’ scoring groups), the proportions of women experiencing a mistreatment item who had ‘high’ and ‘low’ scores were calculated, with levels determined by those scoring above or below the country-specific mean score on the measure.71


Table 1 shows sociodemographic, provider and obstetric characteristics of the sample by country. The average maternal age at the time of birth was highest in Nigeria (29.3 years); the sample from Guinea (23.8 years) was markedly younger than the samples for Nigeria or Ghana. Women with no or primary education were uncommon in Nigeria (5.4%) where half (50.1%) of women had either postsecondary education or higher, while nearly one-third of women in Ghana and two-thirds of women in Guinea had no or primary education.

Table 1

Individual, provider and birth characteristics of the study sample by country

Items included in the three measures of mistreatment, the 7-item Interpersonal Abuse Scale, 3-item Exams & Procedures Index and 12-item Unsupportive Birth Environment Index, are noted in table 2. Distributions of mistreatment items, mean and ranges of scores, and the proportion of women scoring above the country-specific mean are shown for the three measures by country in table 3. The Nigerian sample had the highest frequency on all items, and women in this sample scored highest on all three measures, approximately one point or more than women in either Ghana or Guinea. While country-specific means and ranges varied considerably, many women scored higher than their country-specific average on the measures, indicating that multiple, overlapping forms of mistreatment were observed in each dimension.

Table 2

Description of mistreatment items

Table 3

Distribution of mistreatment items by measure and country (N=1974, unless otherwise noted)

Psychometric analysis results: Interpersonal Abuse Scale

Seven of the initial 10 items in the physical and verbal abuse domains were retained during factor analysis in all three countries to yield the final Interpersonal Abuse Scale: one item about whether at least one form of physical abuse was observed, and six items about different forms of verbal abuse. While multiple physical abuse items were originally assessed, the low frequency of most of those items required collapsing them into a single item. An item about whether the woman was hissed at was dropped during expert consultations as it was deemed specific to the Nigerian context. Two items were also dropped due to very low frequency and low factor loading (λ<0.40): (1) woman was blamed for her or her baby’s poor outcome, and (2) woman received any other forms of verbal abuse (not otherwise categorised). The PCA indicated one dominant factor in all three countries based on a single component with an eigenvalue >1.0, results of the scree plots and graphical depictions of the parallel analyses (online supplemental appendix 4). Table 4 shows the results of EFA and internal reliability for the Interpersonal Abuse Scale. Factor analysis of the seven items supported a consistent one-factor structure; all items showed strong standardised factor loadings (λ>0.40) in all countries, providing evidence of scale unidimensionality and construct validity.

Table 4

Interpersonal Abuse Scale: results of exploratory factor analysis by country

The Interpersonal Abuse Scale showed adequate internal consistency in Nigeria (KR-20 coefficient=0.70). The KR-20 coefficients of 0.57 and 0.54 in Ghana and Guinea, respectively, did not provide evidence for strong internal consistency in these samples. However, with exceptions of 1–2 low pairwise correlations in each country, the inter-item correlations ranged between 0.31–0.62 in Nigeria, 0.29–0.58 in Ghana and 0.23–0.57 in Guinea, indicating internal consistency of this scale in all three countries (online supplemental appendix 2A).

All three items in the Exams & Procedures index were retained. The only item dropped because it occurred in >90% of observations in all three countries was an item about whether a woman was asked her preferred birth position (95.1% in Nigeria, 96.5% in Ghana and 97% in Ghana); the study team deemed the conceptual relevance of this item to the construct assessed in the Unsupportive Birth Environment Index was not high enough to retain it. Four items in this index were observed in <5% of women in all three countries, but were retained to maintain content validity in this dimension based on team consensus of their relevance to the construct: (1) no interpreter used, (2) whether a woman was neglected, (3) whether a woman was directed to clean up blood or other fluids and (4) whether a woman did not have bed at any time.

Validation analysis

Findings from the factor analysis demonstrated high interitem correlations and a consistent factor structure for a unidimensional Interpersonal Abuse Scale in the three country samples. These results provide evidence for high structural and cross-cultural validity, two important elements of construct validity. Low and non-significant Spearman-rank correlations were noted between each mistreatment measure in all countries (all correlations were <0.30, range: −0.06–0.19, p<0.05) (online supplemental appendix 2B), supporting the theoretical approach to developing separate measures to assess the three dimensions of mistreatment (interpersonal abuse, inappropriate conduct of exams and procedures, and unsupportive birth environment); they also provide evidence of construct validity of three measures, rather than a single composite measure. Analysis of predictive criterion validity, however, yielded mixed results for the Interpersonal Abuse Scale and null results for the two indexes across countries (online supplemental appendix 2C).

Reliability: internal consistency analysis

Each item in the measures across all three countries occurred with at least one other item in the measure, indicating internal consistency among the items particularly for the Interpersonal Abuse Scale and the Unsupportive Birth Environment Index (online supplemental appendix 3). Tests of internal consistency showed strong consistency of scores in both the Interpersonal Abuse Scale and the Exams and Procedures Index. The proportion of an observed item was greater among women with ‘high’ scores than for women with ‘low’ scores for every item in all measures across settings, indicating the scores consistently distinguished between levels of mistreatment. The Unsupportive Birth Environment Index showed adequate internal consistency of both items and scores, though several items did not vary by level of mistreatment due to low frequency (table 5).

Table 5

Percent distribution of women experiencing a mistreatment item by mistreatment score and country*


This study used a methodologically rigorous approach to shorten the 56-items WHO labour observation tool47 developed based on five domains of the WHO typology on mistreatment of women during childbirth.12 A set of three concise, multidimensional measures of mistreatment were developed in this analysis: a 7-item Interpersonal Abuse Scale, a 3-item Exams & Procedures Index and a 12-item Unsupportive Birth Environment Index. The measures showed adequate preliminary internal reliability and construct validity in all three countries. Further validation, particularly of the two indexes, is needed given the mixed criterion validity results.

Three measures were developed rather than a single composite measure, underscoring the complex and multidimensional nature of mistreatment. The validity of this approach was supported by the low correlations between measures across settings. Mistreatment captures a broad range of behaviours and experiences, some of which may reflect more intentional forms of abuse while others reflect poor quality care. The latter may be due to health system deficiencies like inadequate resources, personnel or facility policies; they also may reflect norms in training around pragmatic strategies to establish professional distance or maintain control and compliance during birth in an effort to ensure expedient and/or good birth outcomes.13 49 53 72–75 Our measurement approach is consistent with measures operationalised in related research areas. Constructs of violence and abuse are measured using psychometric scales in prior research,76–79 often through adaptations of the Conflict Tactics Scale.37 38 Quality of care in maternal and newborn health has typically been measured using clinimetric indexes and checklists that assess quality of more heterogenous clinical and service-related items.64–66 80

Separate measures assessing different mistreatment dimensions allow for the ability to distinguish between areas of childbirth care that have higher and lower mistreatment scores to determine tailored quality improvement responses or interventions targeting those high scoring dimensions. For example, high scores on the Interpersonal Abuse Scale or Exams & Procedures Index may be addressed through routine audit and feedback loops or interventions targeting provider behaviour and training, whereas changes to elements of the Unsupportive Birth Index could be made through a systems response focused on resources, infrastructure and policy.

Summative scoring without weighting was used to enable simple comparisons of these measures across time and settings. Frequency of repeated incidents of forms of mistreatment was considered for weighting in the Interpersonal Abuse Scale since items in this measure could be based on multiple incident reports. However, accounting for frequency would embed a severity gradient in the scale scores that may not make sense conceptually and clinically. While PCA is a commonly used method to derive weights, this approach yields unique weights based on a specific set of data. It was not appropriate for measures developed from multicountry data and designed to be compared across multiple settings.63 64 Data-derived weights also would have removed women’s experiences from the focus of mistreatment, placing an external valence on which mistreatment items are ‘worse’ than others. This may be inconsistent across settings and individual experiences as societal norms, context, expectations and preferences inform what is viewed as mistreatment.

Results of psychometric analyses support a 7-item unidimensional Interpersonal Abuse Scale based on strong factor loadings of all seven items on a single factor in the three countries. While the scale captures multiple forms of verbal abuse, our assessment of physical abuse as a single item due to low frequency of most forms of physical abuse may skew scale scores by types of verbal abuse and it may limit our ability to understand nuance in these experiences. The physical abuse item includes two dimensions: being physically struck (pinched, kicked, slapped, punched or hit with an instrument), and other uses of physical restraint or force (being gagged, physically tied to the bed, held down forcefully to the bed, forceful downward pressure on the abdomen and other use of force). Future validation of this scale should examine the validity of including these dimensions as separate scale items, as the impetus for these forms of abuse may have different underpinnings.13 49–51 53 74 75 81 82 While the low-frequency of stigma and discrimination items did not scale in our samples, we note that these critical forms of mistreatment may be more common and observable in other contexts, and their inclusion in future forms of the measures should be assessed.

Most women scored 1 or greater on the Exams& Procedures Index across all three countries, indicating that inappropriate conduct of exams and procedures, unconsented care and breaches of privacy were common. Breaches of privacy and exposing the woman’s body may be influenced by the facility birth environment such as different physical layouts, space limitations and crowding in study facility. Sen et al. discussed normalised practices in low-income and middle-income country settings such as orienting beds and labour tables in open wards towards nurse stations and entryways to prioritise efficiency of labour progress monitoring over women’s privacy, particularly when there is a limited number of providers.72 In contrast, unconsented care may be influenced by differences in norms or standards in clinical practice, perceived health literacy of the women, or power relations prioritising clinical decision making over two-way communication or women’s preferences.

Some items in the Unsupportive Birth Environment Index refer specifically to the birth environment and facility resources (e.g., availability of fluids, curtains or partitions, no bed or bed sharing), and are similar to those captured in widely-used facility assessments like the Service Provision Assessment (SPA) or the WHO Service Availability and Readiness Assessment (SARA). One important difference is that the mistreatment items are woman-centred in that they reflect how women interact with their birth environments. A facility may technically have clean water available or an adequate number of beds to accommodate their monthly volume, but this index measures whether a woman actually had access to and used these resources at an individual level. Two items in this index occurred in the vast majority of observations, no offer or denial of pain relief and not being offered a labour companion occurred in most observations, though they did not reach our criteria for omission of >90% frequency in all three samples. They were retained due to their theoretical importance in contributing to an unsupportive birth environment; both items signal areas for improvement on two evidence-based recommendations by the WHO for a positive birth experience.83 Future refinement of the labour companion item in this index should consider using language used in the WHO community survey tool regarding whether a woman was ‘allowed’ to have a companion (instead of ‘offered’) as a targeted assessment of denying a woman’s preferences for companionship.

Future validation of all three measures should further assess criterion validity given our mixed results from the predictive criterion analysis, as well confirmatory validation approaches and further assessment of measurement invariance. However, caution should be used when interpreting these findings, as it is difficult to determine whether the inconsistent findings are due to the actual validity of the measures or due to the criterion against which they were measured (or both). We used global measures of satisfaction with care and future intentions to deliver in the same facility to be consistent with studies that have used this criterion to validate composite measures of respectful maternity care35 and person-centred maternity care.41 42 However, the extent to which women’s reports of satisfaction are consistently related to mistreatment, particularly third-party labour observations of interpersonal abuse, inappropriate conduct of exams and procedures or an unsupportive birth environment is unclear. A recent analysis of the larger WHO study community survey data found that while over one-third of women experienced mistreatment, nearly 90% of women reported overall satisfaction with care received.84 These results highlight the complexity of comprehensively measuring quality of care given this duality of high satisfaction as an outcome measure—informed by expectations of care, empowerment, and knowledge of rights to high quality and respectful care—in the presence of mistreatment as a discrete process measure of women’s experiences with health systems.9 84 Taken together, our confidence in relying on these criteria to help firmly establish validity of these measures is limited. As research in this field evolves and more measures are developed and refined, further validation of our measures of care quality and experience can be tested against other mistreatment measures to help determine their criterion validity. Further refinement and expansion of items in the indexes may also be needed to iteratively maintain content validity, as the content variety and scope of items is particularly important for the validity of indexes.

These measures were developed using data from three West African countries; their application in other geographical settings and types of facilities (health centres, district hospitals, tertiary hospitals) will also help determine their construct validity and generalisability. While the measures showed good internal reliability, further reliability testing of the Interpersonal Abuse Scale in other settings is needed as the reliability coefficient in Ghana and Guinea did not provide strong evidence of scale reliability. Inter-rater reliability assessments were not possible with this data, but should also be conducted in future studies to further establish reliability of the measures, particularly since they are based on labour observations.

Strengths and limitations

A major strength of this study is the use of data from a multicountry study, adding to the robustness of the validation of the newly developed measures. A strength of the use of continuous labour observations as the basis of these measures is that they limit potential recall and social desirability biases that can accompany women’s reports of mistreatment. Continuous observations and recording of mistreatment in real time is particularly important to capture mistreatment during birth, as the primary WHO study analysis found mistreatment peaked around the time of birth.85

There are several limitations of this analysis. All study facilities were public and located in urban settings, limiting generalisability of the mistreatment measures given previous research has found differences in how women are treated during childbirth by type of facility.15 21 22 42 86 87 Limited information was available in our data on technical and human resources available in facilities as well as more detailed information on providers such as gender and experience levels; this information may be useful to further assess the construct validity based on prior research that shows different likelihood of mistreatment by various provider characteristics.11 20 88 Assessing items in the Interpersonal Abuse Scale as binary rather than incorporating repeated incidents could have reduced data variability. Using continuous observations poses a potential Hawthorn effect, though previously conducted assessments of this data did not show a significant presence of this effect.10 Labour observations may not be the most accurate mode to assess more experiential aspects of mistreatment (i.e., aspects of mistreatment that may be intrinsically based on women’s perceptions on care experienced, such as stigma or discrimination).9 Our measures may include items that could be readily observed but ‘matter’ less to women’s personal experience of care, removing items that may be important to a women’s birth experience such as blaming women for poor outcomes. Triangulating labour observation measures with measures based on women’s reports of mistreatment, like those developed in an analysis of the WHO community survey data89, aligns multiple perspectives to enhance our understanding of rigorously measuring this complex phenomenon. Other methodological limitations using the study dataset and tools have been previously described.10

Implications for policy and practice

These measures can be adapted and used in future research on mistreatment to quantify the burden, frequency and overlap in multiple types of mistreatment in a standardised way that can be compared across studies, settings and over time.

The concise nature of the measures, compared to the 56-item WHO labour observation mistreatment instrument, offers an opportunity to incorporate them in longer surveys of other aspects of maternal health or for quality improvement initiatives. The measures can also be used in conjunction with facility assessment tools like the SPA or SARA to gain a better understanding of the relationship between the health system or facility context and dimensions of mistreatment. The elements of interpersonal behaviour and abuse as well as health system infrastructure and resource constraints in the measures allow for their use to assess multicomponent interventions that impact both interpersonal aspects of care and more systemic, structural aspects of the process of care. The measures can also translate evidence-based recommendations into indicators that can be used for routine measurement of mistreatment, quality assessments and monitoring and accountability efforts to improve quality of maternal healthcare at the country level.90–92 This kind of routine measurement is essential to monitor progress towards the promotion of high-quality, respectful birth experiences for all women.

Data availability statement

Data are available on request. The analytical study dataset from the 'How women are treated during facility-based childbirth' WHO study is deidentified and, archived through WHO/HRP’s electronic recordmanagement system. Data requests with an expression of interest in pursuing multicountrysecondary analyses with a specific research question can be made to Moreinformation about the study tools are available here: and theprimary publication from the study here:

Ethics statements

Ethics approval

Institutional permission for recruitment and observation was obtained from each site; consent was not sought from providers. This study was approved by the WHO Ethical Review Committee (A65880), WHO Review Panel on Research Projects, and in-country ethics committees: Guinea (le comité national d’éthique pour la recherche en santé]; Nigeria [Federal Capital Territory Health Research Ethics Committee; Research Ethical Review Committee, Oyo State; and State Health Research Ethics Committee of Ondo State); and Ghana (Ethical Review Committee of the Ghana Health Service; Ethical and Protocol Review Committee of the College of Health Sciences, University of Ghana), and Myanmar (Ethics Review Committee, Department of Medical Research).


We thank the data collection team for their excellent work and the women who participated in this study. We appreciate the thoughtful contributions of participants in the end-of-study investigator’s meeting. We also thank Dr. Karen Bandeen-Roche (Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health) for her technical guidance during the conceptualisation and conduct of this analysis.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Handling editor Seye Abimbola

  • Twitter @blair_berger, @hediehmm, @otuncalp

  • Contributors BOB led the analysis, drafted the manuscript, and read and approved the final manuscript. DMS contributed to the analysis and drafting of the manuscript, and read and approved the final manuscript. HM provided input on the analysis and drafting of the manuscript, and read and approved the final manuscript. MAB provided input on the analysis and read and approved the final manuscript. KA-B, HHL, TAI, TMM and MDB provided input on the drafting of the manuscript and read and approved the final manuscript. OT provided input on the analysis and drafting of the manuscript, and read and approved the final manuscript.

  • Funding This research was made possible by the support of the American People through the United States Agency for International Development (USAID), and the UNDP/UNFPA/UNICEF/WHO/World Bank Special Programme of Research, Development and Research Training in Human Reproduction (HRP), Department of Reproductive Health and Research, WHO.

  • Disclaimer The contents of this article are the sole responsibility of the authors and do not necessarily reflect the views of USAID, the United States Government, WHO, or their individual institutions.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.