The causal effect of delivery volume on severe maternal morbidity: an instrumental variable analysis in Sichuan, China

Objective Findings regarding the association between delivery volume and maternal health outcomes are mixed, most of which explored their correlation. This study aims to demonstrate the causal effect of delivery volume on severe maternal morbidity (SMM) in China. Methods We analysed all women giving birth in the densely populated Sichuan province with 83 million residents in China, during the fourth quarters of each of 4 years (from 2016 to 2019). The routinely collected discharge data, the health institutional annual report data and road network data were used for analysis. The maternal health outcome was measured by SMM. Instrumental variable (IV) methods were applied for estimation, while the surrounding average number of delivery cases per institution was used as the instrument. Results The study included 4545 institution-years of data from 1456 distinct institutions with delivery services, reflecting 810 049 associated delivery cases. The average SMM rate was approximately 33.08 per 1000 deliveries during 2016 and 2019. More than 86% of delivery services were provided by a third of the institutions with the highest delivery volume (≥143 delivery cases quarterly). In contrast, less than 2% of delivery services were offered by a third of the institutions with the lowest delivery volume (<19 delivery cases quarterly). After adjusting the confounders in the IV-logistic models, the average marginal effect of per 1000 cases in delivery volume was −0.162 (95% CI −0.169 to –0.155), while the adjusted OR of delivery volume was 0.005 (95% CI 0.004 to 0.006). Conclusion Increased delivery volume has great potential to improve maternal health outcomes, while the centralisation of delivery services might facilitate maternal health promotion in China. Our study also provides implications for other developing countries confronted with similar challenges to China.

The causal effect of delivery volume on severe maternal morbidity: an instrumental variable analysis in Sichuan, China Nan Chen, 1,2 Jay Pan 1,2 Original research

INTRODUCTION
Whether or not the centralisation of health services should be adopted as a strategy to facilitate population health promotion is a study focus both for policy-making and research purposes. [1][2][3][4][5][6] Centralisation policy for resource allocation also is described as 'concentration' or 'regionalization'. 7 8 Volume-outcome is the basis for promoting centralisation policy. 9 For many surgeries, the volume has been demonstrated to be positively associated with health outcomes because of learning effects or economies of scale. 2 9 10 In the obstetric field, scholars have conducted several studies to investigate the relationship between delivery volume and maternal health outcomes, which, however, provided mixed findings. While some studies indicated that delivery volume is positively correlated with maternal health outcomes, [11][12][13][14][15] other studies reported the absence of a statistically significant correlation between them. [16][17][18][19] It is challenging to explore the relationship between delivery volume and maternal health outcomes by experimental research. Because considering the multiple stakeholders, the WHAT IS ALREADY KNOWN ON THIS TOPIC ⇒ Findings of the association between delivery volume and maternal health outcomes are mixed. ⇒ Limited studies explore the causal effect of delivery volume on adverse maternal health outcomes.

WHAT THIS STUDY ADDS
⇒ The instrumental variable method was first applied to identify the causal effect of delivery volume on severe maternal morbidity (SMM). ⇒ Information of 810 049 delivery-related discharge records from China during 2016 and 2019 were used for the empirical analysis. ⇒ An increase of 1000 deliveries would reduce the 16.2% rate and get a 0.005 OR of SMM.
BMJ Global Health researchers have to not only obtain the permission of administrators but also seek support and cooperation from medical institutions and pregnant women. Existing studies had to merely rely on observational data for analysis. It should be noticed that these studies generally adopted empirical analytical strategies to investigate such association. Due to the presence of confounding factors, such association identified might be a biased estimate. In other words, confounding factors or endogenous problems come from two aspects. First, the two study objects are in a simultaneous relationship where delivery volume and obstetric health outcomes affect each other. Specifically, while the delivery volume would affect maternal health outcomes, such outcomes would also affect the choice of mothers in seeking hospital services, thus further affecting the delivery volume. Second, unobservable heterogeneous confounding factors might be induced by patients in the association analyses. For instance, high-risk women might prefer to seek hospital services from healthcare institutions with better health outcomes, while they are more likely to have adverse outcomes.
To bridge the gap in the current literature, the instrumental variable (IV) method was adopted for the first time to identify the causal effects of delivery volume on maternal health outcomes. The IV method is a compelling analysis approach to explore causal effects based on observed data. 20 It usually uses one or more exogenous IVs that are related to the critical exposure variable but not directly related to the outcome variable, to identify the impacts of the exposure variable on the explained variable, which could lead to a consistent estimation. This study selected the surrounding average number of delivery cases per institution as the IV. The selection was based on the assumption that the number of delivery cases in the surrounding area of a specific hospital is positively related to its actual delivery volume without directly affecting the maternal health outcomes produced by that specific hospital. The results of this study will provide more substantial evidence on the causal effect. We tried to demonstrate whether the rising delivery volume could improve maternal health outcome by applying the IV method in this paper.
This study used maternal information from a populated province with 83 million residents during the fourth quarters of each of 4 years (through 2016 to 2019) in China to conduct the analysis. To date, existing evidence on volume-outcome was provided by highincome countries. 11-19 21 However, low-income and middle-income countries are quite different from highincome countries in terms of economic development status, educational levels, sanitary conditions and other factors that may affect the volume-outcome relationship. As the world's largest developing country, the results from China have great potential to inform health-related decision-making procedures among other low-income and middle-income countries confronted with similar issues.

Study area and data source
Sichuan province is located in south-western China with a 486 000 km 2 area and 83.67 million residents. As the province has many similarities with the nationwide situation of China in terms of the geographical, demographical and economic distribution characteristics, the findings based on this study area were believed to have relatively good external validity. 22 In the east of China, the main geomorphological features are plains and hills with nearly 41% of the country's population and a high level of economic development. In contrast, the western region of China is dominated by plateaus and mountains with a relatively sparse population and a low-level economic development. Similarly, the east of Sichuan province is dominated by plains and hills, with a dense population and rapid economic growth, while the west of Sichuan is dominated by mountainous areas, with a sparse population and lagging economic development. These gaps between different regions lead to significant disparities in the development level of medical and health services, including obstetric services. The quarterly delivery volume ranges from 1 to 5664 in our database, which provides the possibility of applying the IV estimation method.
All the discharge data regarding inpatients discharged during the fourth quarters between 2016 and 2019 (1 October to 31 December 2016-2019) were included for analysis. The discharge data contained a list of essential demographical characteristics, the disease diagnoses, the conditions and dates for admission and discharge, the surgery and operation procedures, the mode of discharge, and the expenditure information for every single case. 23 The health institution annual report data between 2016 and 2019 were adopted to describe institutional characteristics, including the address, ownership types, hospital level and the number of beds in each health institution. 24 These two data sets were extracted from the Sichuan Health Statistics Data Collection and Decision Support System. Public statistical yearbook data between 2016 and 2019 were used to describe region-specific characteristics. The latest version of road network data was retrieved from the National Catalogue Service for Geographic Information system. The geographical coordinates of health institutions and patients were obtained based on their addresses by using the geocoding Application Programming Interface of Baidu Map, a web-based map on China's internet frequently accessed by Chinese residents. 25 The travel time between patients and healthcare institutions or between two different institutions was calculated through ArcGIS V.10.5 combined with the coordinate information and the road network data.
The data screening was performed via the following steps. First, all the records potentially related to delivery were identified. The discharge data collected in the last quarters between 2016 and 2019 in Sichuan province were screened for extracting delivery records. The number of preliminary identified delivery-related records BMJ Global Health was 906451. Then the following exclusion criteria were adopted to ensure the data validity: (1) Abortion records; (2) Age not between 15 years and 49 years old; (3) Male records. As the result of data screening, a total of 810 049 records were retained (figure 1).
The quality of the discharge data has always been the focus of China's health administrative department. With the gradual promotion of diagnosis-related groups in China, the coding quality of the discharge data is increasingly important. Since many indicators of performance evaluation of health institutions are calculated based on discharge data, institutions pay more and more attention to the quality of data.
The quality of data was generally acceptable. According to the bulletin, the average admission cases in Sichuan province within one quarter was approximately 4.56-4.95 million, and the infacility delivery rate was 99.32%-99.77%. 26 Our database contained 4.57-4.97 million discharge records, indicating that our data had great potential to reflect the overall situation of inpatient services delivered by all healthcare institutions across Sichuan province.
Based on the number of neonatal births recorded in the health statistical yearbook, the average amount of infacility births within one quarter was predicted to be 243 894-255 328. 26 The total number of deliveries within one quarter in our database ranged from 186 353 to 210 910 (including multiple births). Despite the fluctuations in the number of delivery cases during different seasons and the discrepancies embedded in coding quality, about 78%-85% of cases out of total maternal deliveries were successfully identified from the discharge database.

Variables
We applied severe maternal morbidity (SMM) as the maternal health outcome indicator. SMM is one of the most commonly adopted obstetric outcome indicators in many countries. 27 28 It is also described as 'maternal near miss' or 'near miss morbidity'. The WHO has defined SMM as 'a woman who nearly died but survived a complication'. 27 28 The Centers for Disease Control and Prevention (CDC) in the USA recommended SMM as a monitoring quality indicator of maternal care. 29 30 In recent years, some researchers in China started to promote SMM as a maternal outcome indicator. 28 31 In this study, we adapted the SMM definition previously proposed in the USA on China's hospital discharge data (online supplemental table 1). The International Classification of Diseases (ICD) diagnosis and procedure codes in the discharge data were employed to identify SMM cases, which contained 21 indicators. (In the continuum of SMM, maternal death is the end event that is worse than SMM. We rerun all the models with considering both SMM and death as the health outcomes. Because of the very low maternal mortality, the results showed little difference (the difference was in the fourth decimal place). Due to the space limitation, we didn't present them in this paper.) 30 The exposure variable was the delivery volume, defined as the number of deliveries in a quarter within each institution. Delivery volume variables were added into the models as continuous variables, and were treated as categorical variables in the descriptive analysis. The institutions were divided into three groups according to the tertiles, which was the most commonly used classification method based on the literature. 11-13 18 As a result, the BMJ Global Health low-volume institutions had fewer than 19 deliveries per quarter, the medium-volume institutions had between 19 and 142 deliveries per quarter, while the high-volume institutions had 143 or more deliveries per quarter.
Other potential confounders including patients' characteristics, institutional characteristics and regional characteristics were considered. The patients' characteristics included demographical variables and socioeconomic variables such as maternal age, minority, marital status, living in rural/urban area and health insurance type. There were three types of social health insurance programmes in China during the study period: 32 Urban Employment Basic Medical Insurance (UEBMI); Urban Residents Basic Medical Insurance (URBMI); New Cooperative Medical Scheme (NCMS). Types of social health insurance were related to socioeconomic status and available medical resources for mothers. UEBMI was for employees who have a job. URBMI was for unemployed people living in cities and towns. NCMS was for people living in rural areas. The risk level of delivery was a significant confounder for the relationship between delivery volume and SMM. 11 21 Difference in delivery risk levels could partially reflect the heterogeneity of patients and provide evidence to inform endogenous problems. We classified deliveries into high-risk and low-risk groups based on the risk factor list. High-risk delivery contained at least one risk factor in the code list, which combined the risk factor codes in the Society for Maternal-Fetal Medicine and from the Chinese experts' consensus 33 34 (online supplemental table 1). In addition to risk groups, the admission source might partially reflect the delivery risk and was therefore adjusted as a confounder. The institutional characteristics included the hospital level, ownership types, number of beds, number of beds for Obstetrics and Gynaecology (OG). The regional characteristics included the gross domestic product per capita (GDP per capita) and urbanisation rate. As this study did not intend to investigate the time trends of causal effects, the time variable (year) entered the models as a dummy variable.

Statistical analysis
In the descriptive analysis, counts and proportions were used for categorical variables, and χ 2 tests were used for testing the differences between groups. For continuous variables, median and IQRs were used for descriptive statistics, and Kruskal-Wallis rank-sum tests were adopted to test the differences between groups.
To avoid collinearity problems, we used the variance inflation factor (VIF) to measure multicollinearity. As the independent variable SMM was a dichotomous variable, the linear probability models and traditional logistic regression models were applied. Due to limited dependent variables, the coefficient estimations of the linear model and non-linear model were very close, and both methods produced robust outcomes . 35 The linear probability and logistic regression models could measure the association between delivery volume and SMM. In equation (1), SMM=1 indicated the occurrence of SMM. The delivery volume was indicated by V , and L were the confounders, including individual characteristics and institutional characteristics. Pr was the probability of SMM conditional on exposure variable V and confounders L . The parameter of our interest was β 1 , which showed the relationship between delivery volume and SMM. In the linear probability model, the link function G was the identity link. In the logistic regression model, the link function G was the logit link.
There were two endogeneity problems might lead to a biased estimation: the reverse causality and the unobserved patient heterogeneity. The reverse causality indicated the exchangeable direction of the causal effect. On the one hand, learning effects and economies of scale support the causal effect from delivery volume to SMM. 9 36 On the other hand, a better outcome might lead to a good reputation with more attractiveness. Women can freely choose the delivery institution in China, resulting in a better reputation that can attract more patients. The reverse causality results in a downward bias to the coefficient of delivery volume. The omission of unobserved patient heterogeneity might also cause the endogeneity of delivery volume. Patient characteristics could influence their choice of institution. For example, university hospitals often receive patients with more severe comorbidities. 36 The women with worse health conditions who were more likely to have adverse outcomes may have a stronger desire to choose a better healthcare institution for deliveries. Even though we tried to adjust the delivery risk by controlling the risk groups variable, the binary risk groups were too crude to reflect the delivery risk heterogeneity among mothers in an accurate manner. Omission of patient heterogeneity might lead to an upward bias of the coefficient of delivery volume.
IV estimation was a tool for solving endogeneity problems. We chose the average number of delivery cases in the surrounding region of a specific delivery institution as the instrument variable. We assumed that the actual delivery volume for a specific institution would increase if more delivery mothers live in the area around this institution, and would decrease if other institutions in the surrounding area also provide delivery services. The ratio combined the information from both sides. Similar variables have been used as an instrument for hip fractures and coronary artery bypass graft studies. 9 36 The searching area was limited to 2 hours' driving distance based on the geographical accessibility target proposed by a Lancet commission on global surgery. 37 This instrument variable was assumed to meet three conditions. First, it was associated with delivery volume and would only affect health outcomes through delivery volume. Distance and convenience were two influential factors for the selection of delivery institutions. 38 In general, patients preferred to choose closer health BMJ Global Health institutions for convenience. Second, the residence of mothers could be considered as exogenous to maternal outcomes, for it should have no direct influence on the quality of health services. It is unlikely for most mothers to choose where to live just depending on the quality of obstetric delivery services. Although this condition hold in most cases, some confounder might still challenge the exclusion restriction assumption, such as the regional development level, socioeconomic status and financial situation of the mother, and the delivery risk level. The more developed regions may have a more density population and higher quality medical resources, which leads to a link between delivery volume and health outcomes. Mothers with higher socioeconomic status and better financial situation are more likely to live in areas with high-quality resources. Similarly, higher delivery risk level mothers are more motivated to live near high-quality institutions. Considering these confounding effects, we added regional characteristics, institutional characteristics and individual characteristics into the models to meet the exclusion restriction. This approach has also been used in other studies. 9 39 It is a more rational assumption that the instrument met the second condition after controlling the potential confounders. Third, the instrument was monotonous. It was unlikely that an increased number of potential patients or a reduced number of potential institutions would lower delivery volume. Under the monotonicity condition, only a local average causal effect was explored. The IV estimation measures the causal effect for institutions obeying the monotonicity hypothesis. The estimated effect was 'compliers average causal effect (CACE)'.
The two-stage estimation technique in the IV-logistic models was used for estimating. The first stage model was as follows: where V was the delivery volume, Z was the IV, L indicated the observed confounders. The fitted values of delivery volume were used in the following second-stage model: where γ 1 measured the causal effect of delivery volume on SMM. Link function G was identity link and estimated by the ordinary two-stage least squares (TSLS) estimation approach. Link function G was a logit link in the IV-logistic model and was estimated by the two-stage estimation approach. In the IV-logistic model, we applied the 'sandwich formula' to the whole equation system to get the SE through the R package named 'ivtools'. 39 And the average marginal effect (AME) of delivery volume was obtained by the R package named 'margins'. The endogeneity test was applied to test the endogeneity. F test of the weak instrument was applied for assessing instrument validity. All the analyses were performed with R V.4.1.1 and ArcGIS V.10.5.
Sensitivity analyses were applied for robustness testing. First, the 2 hours' driving distance was replaced by the 1 hour driving distance to check the impact of the searching area. Second, the outcome variable SMM was recalculated with the exclusion of receiving blood products transfusion due to the controversial definition of SMM, which indicated that blood transfusion should be considered as a life-saving technique instead of an indicator reflective of SMM. 27 Third, the institutions with less than 10 deliveries were excluded for testing the impact of extreme values.

Patient and public involvement
Both administrative data and public data were used in this study. No patient was recruited or involved in the research. Patients were not invited to contribute to the designing, writing or editing process throughout the research.
A total of 810 049 delivery cases were included in the analysis. The overall SMM rates fluctuated with a reduction of 5.4% from 2016 to 2019 (table 1). The overall SMM rate fluctuated up and down during the 4 years and remained below 4%. The indicator of blood products transfusion detected the largest number of SMM cases, as reported by previous studies from the USA. 30 The rates of blood products transfusion reduced slightly during the 4 years. In contrast, the rates of severe anaesthesia complications, air and thrombotic embolism, adult respiratory distress syndrome, sepsis indicators, and ventilation significantly increased during this period, while acute renal failure and amniotic fluid embolism significantly decreased.
The first column of table 2 shows the descriptive statistics for the whole group. The overall SMM rate was about 3.31%, and the median maternal age was 27 years. Around 10.97% of mothers belonged to ethnic minority groups, 95.98% had been married and 50.69% were residents living in an urban area. The most popular insurance types of patients were NCMS (30.53%), URBMI (24.2%) and fully self-paid (21.85%). High-risk delivery accounted for 49.74%. Most patients were admitted through the outpatient department (77.92%), while nearly 20% were admitted through the emergency department. Regarding hospital levels and ownership types, most patients gave birth in tertiary and public non-profit institutions, while urban institutions had more deliveries than rural institutions. The total number of deliveries decreased between 2016 and 2019.

BMJ Global Health
The other columns in table 2 showed the differences between delivery volume groups. All the differences were tested by the χ 2 test or Kruskal-Wallis rank-sum test. Due to the large sample size, all the differences were statistically significant. Without any adjustment, SMM was more frequent in the high-volume delivery institution than in the low-volume and medium-volume institutions. At the same time, the proportion of high-risk delivery in the high-volume institution (51.83%) was more than twice as much as the proportion in low-volume institutions (22.32%). The positive correlation suggested that patient heterogeneity was an important confounding factor that could cause endogenous problems. The proportion of patients belonging to ethnic minority groups was highest in the medium-volume institutions, while the proportion of married mothers was the highest in the high-volume institutions. In terms of health insurance types, the UEBMI and fully self-paid patients preferred high volume institutions, the URBMI patients preferred low-volume institutions and the NCMS patients preferred medium institutions. In addition, patients living in rural areas were more likely to choose low-volume and medium-volume institutions than in urban regions. The median for the number of beds, doctors and nurses, the regional GDP per capita and urbanisation rate increased with increased delivery volumes. The number of deliveries within each volume subgroup decreased from 2017 to 2019.
At the institutional level, the obstetric deliveries were mainly clustered in high-volume institutions. As illustrated by figure 2, the high-volume institutions provided more than 86% of delivery services, while the low-volume institutions only offered less than 2% of delivery services. The crude SMM rate without any adjustment was higher in the high-volume institutions than in the mediumvolume and low-volume institutions (figure 2).

Regression analysis
The quarterly delivery volume was divided by 1000 before being added into the regression models to make the results more readable. The VIF values were all below 10, suggesting that the multicollinearity levels could be accepted. Table 3 shows the estimation results of regression models (table 3). There was no ICD-10 code in China that could indicate heart failure/arrest during surgery or procedure. Hence, the tenth indicator was not able to be identified in our data. ICD-10, International Classification of Diseases Tenth Revision; NA, not available; SMM, severe maternal morbidity.

BMJ Global Health
Models (1) to (5) showed the results of linear probability models. In model (1), the association between delivery volume and SMM without any adjustment was estimated. The patient characteristics, institutional characteristics and regional characteristics were separately added into models (2) to (4) and were all contained in model (5). The dummy variables of the year were controlled in models (2) to (5). The results of linear probability models were not consistent. The coefficients of delivery volume were positive in models (1), (2) and (4), but not statistically significant in model (2) and (4). Model (3) and model (5) showed the negative association between delivery volume and SMM. Similar covariates settings were applied in logistic regression models (6) to (10). The AME estimations of logistic regression models were very similar to linear probability models. The coefficient of delivery volume in linear probability full model (5) was −0.021 (95% CI −0.022 to -0.020), which indicated that each increase of 1000 cases in delivery volume was associated with a 2.1% reduction of SMM rate. The AME of delivery volume in the logistic regression model (10) was −0.015 (95% CI −0.016 to -0.014), suggesting that each increase of 1000 deliveries was associated with a 1.5% reduction of SMM rate. The adjusted OR (AOR) in model (10) was 0.615 (95% CI 0.594 to 0.636), which showed that each increase of 1000 deliveries was associated with a 38.5% reduction of odds for SMM. The results of linear probability models and logistic regression models only measured the association between delivery volume and SMM rather than a causal effect.
The causal effects were estimated by IV models as shown in models (11) to (20). The significant F statistics and adjusted R 2 of the first-stage regression implied that the instrument was not weak. The coefficient for surrounding delivery volume average was positive as we had expected, suggesting that an increased delivery volume average in the surrounding areas was associated with increased delivery cases actually occurring in a specific institution    Figure 2 Delivery proportion, institution proportion and severe maternal morbidity (SMM) rate by delivery volume groups.

BMJ Global Health
(online supplemental table 2). The coefficients of delivery volume in all IV models were negative and the AORs were all smaller than 1 with different settings of covariates, suggesting that delivery volume was an independent protective factor of SMM. Model (15) implied that an increase of 1000 deliveries could reduce 13.3% in SMM rates. The marginal effect of ordinary IV estimation in model (15) was a constant value that equalled the coefficient. For the IV-logistic regression model, the marginal effect decreased with increased delivery volume, which is more reasonable than a constant value. The AME and AOR of the preferred specification model (20) were respectively −0.162 (95% CI −0.169 to -0.155) and 0.005 (95% CI 0.004 to 0.006), which implied that 1000 deliveries could lead to the reduction of 16.2% and 99.5% in the rates and odds of SMM, respectively. The full estimation results were shown in supplemental tables (online  supplemental table 3). From a vertical perspective, the first column of table 3 presented the estimates without any covariates, the second column displayed the estimates after controlling individual characteristics and year, the third column showed the estimates after controlling institutional characteristics and year, the fourth column presented the estimates after controlling regional characteristics and year, and the last column contained the estimates after

BMJ Global Health
controlling all the confounders. Regardless of which type of confounders were added separately, the estimates were reduced compared with the original ones, and the use of the IV method would further reduce the estimates.
Of the three categories of factors, institutional characteristics had the greatest impact on the estimates, and its results were closest to the results in the fifth column considering all confounders. Of the remaining two categories of factors, regional characteristics had a greater impact on estimates than individual characteristics.

Sensitivity analysis
We used three different IV-logistic regression models to assess the robustness of the results. The first sensitivity analysis replaced the 2 hours' driving distance with 1 hour. The AME and AOR and their CIs were −0.153 (95% CI −0.159 to -0.147) and 0.007 (95% CI 0.006 to 0.008), respectively. Blood production transfusions were excluded from the second sensitivity analysis for analysing the dependent variable SMM. The AME and AOR and their CIs were respectively −0.162 (95% CI −0.169 to -0.155) and 0.294 (95% CI 0.183 to 0.470).
The institutions with less than 10 deliveries per quarter were excluded from the third sensitivity analysis. A total of 4054 delivery cases and 1104 institution-years were excluded. The AME and AOR and their CIs turned out to be −0.163 (95% CI −0.170 to -0.156) and 0.005 (95% CI 0.004 to 0.006), respectively. These sensitivity analyses demonstrated the robust causal effect of delivery volume on SMM (online supplemental table 4).

DISCUSSIONS
Our study examined the protective causal effect of delivery volume on SMM. The adjusted marginal effect estimates of the linear probability model and the logistic regression model were much smaller than the IV estimates due to the endogenous problems. The institutional characteristics posed the largest impacts on the effect estimates, suggesting that such volume-outcome was the most apparent among healthcare institutions with similar characteristics. Two potential mechanisms might be used to explain such causal effect. Specifically, the 'practicemakes-perfect' learning effect serves as a contributor to the causal effect, meaning that doctors tend to acquire more medical expertise and gain more clinical experiences from an increased number of disease cases. 9 In addition, large-scale institutions are typically composed of a wide range of clinical departments with high-quality medical resources, thus are capable of providing highquality health services, which is the 'economies of scale' effect. 36 Both mechanisms tend to pose potential impacts at the institutional level, thus partially contributing to the outcomes.
Our findings are consistent with the results of previous studies, including studies conducted by Bozzuto et al (2019), 11 Aoyama et al (2019) 12 and Campbell et al (2018). 13 Such volume-outcome association has also been identified between delivery volume and caesarean delivery, 15 40-43 delivery volume and postpartum haemorrhage, 14 44 as well as between delivery volume and neonatal morbidity. 13 45 The logistic regression model was the most commonly adopted model by previous studies as mentioned above. The hierarchical generalised linear model, marginal log-linear models, mixed-effects spline regression have also been applied by relevant studies, while none of them have investigated the causal effect or endogenous problems.
Our findings are different from the results reported by Booker et al (2018) and Clapp et al (2017), which suggested that delivery volume was not significantly associated with severe morbidity risk or caesarean section after controlling the patient and hospital characteristics. 16 18 These two studies both employed multivariable regression models to measure the association instead of the causal effect. Besides, they both only used categorical variables to measure institutional heterogeneity, which was crude and insufficient. More accurate indicators reflective of institutional characteristics should be added to the models in order to avoid confounding problems.
All the relevant studies mentioned above are from the USA except two from Korea. 15 40 To the best of our knowledge, this is the first study that provides evidence from a developing country in terms of the causal effect of delivery volume on maternal health outcome, which also sheds lights on China's remarkable milestones on maternal health promotion from a novel perspective. In China, maternal mortality has been significantly reduced by dramatically increased in-hospital delivery rates over the past decades. [46][47][48] In 2000, China initiated a safe motherhood programme that encouraged in-hospital delivery and discouraged community midwifery and home-based delivery. 49 This policy promoted the primary centralisation of obstetric delivery services from families to healthcare institutions. The centralisation trend is still keeping on. As the nationwide fertility rate continues to decline, some low-volume delivery institutions may even have no cases of obstetric delivery like the other low-fertility countries. 19 50-53 Meanwhile, the constantly intensified urbanisation as well as the labour force migration have also exacerbated the tendency of childbearing-age women to concentrate in urban areas. In Sichuan, the number of OG doctors rose from 14 244 in 2016 to 15 915 in 2019, while the number of delivery institutions decreased from 1168 to 1063 during the same period of time. Given the inevitable trend towards centralisation of delivery services, it is important and urgent to study the impact of centralisation on service quality. The causal effect of volume-outcome is a recognised and important approach to the impact of centralised strategy on service quality. It is of great significance to verify its existence in the field of delivery services in China. The results can provide a reference for policy makers to promote a centralised strategy.
We also took the lead in applying the SMM definition on hospital discharge data under the context of China's healthcare system. As indicated by a recently BMJ Global Health published scoping review of SMM, diverse versions of SMM definitions have been applied. 27 The SMM definition CDC basis has been mainly applied in the USA and further modified into the Canadian, Australian and Swedish versions. 27 Likewise, this study adapted the SMM definition from the USA into a Chinese version. The ICD code-based SMM definition is a laboursaving and easily generalised method, which has great potential to be used for the long-term surveillance of obstetric quality. Meanwhile, SMM was sensitive to agerelated risk. 27 According to a previous study, 78.7% of maternal deaths were identified with SMM, and 1% of women with SMM died. 21 Targeting on reducing SMM can make the death-related critical points controlled at an earlier stage and achieve early prevention for pregnant women.
It is difficult to recommend specific minimum delivery volume thresholds due to several study limitations. First, the discharge data used in our study only contained information collected during hospitalisations. SMM is only a relatively general maternal outcome indicator that partially reflects the quality of obstetric delivery. More specific indicators associated with both maternal and neonatal outcomes should be considered in the process of identifying delivery volume thresholds. Second, the SMM rate might have been underestimated in our study as the data sets we used for analysis only contained the first procedure codes, thus leading to the omission of some SMM cases. As an attempt to identify the thresholds, we divided the delivery volume into two groups and estimated the coefficients of the high-volume delivery group in models. The ordinary IV models and TSLS estimation were applied. The distribution of delivery volume showed a skewness distribution (online supplemental figure 1), while the 25%, 50% and 75% quantiles of quarterly delivery volume at the individual level were 260, 501 and 893, respectively. Therefore, the threshold value was set up as an integer ranging from 10 to 1000. The coefficients and the significance changed accordingly with changed threshold values as described in online supplemental figure 2. The P values for the coefficient test were far below 0.001. The coefficients were all negative and their absolute values decreased with increased values of thresholds. As indicated by the figure, the marginal effect of delivery volume became significantly smaller after the threshold value was set up to 200.
Under the current trends of obstetric services towards centralisation, ensuring the accessibility of obstetric services has become a critical challenge in China. On the other hand, spatial accessibility is rather essential for maintaining the in-hospital delivery rate at a relatively high level. 54 55 As such a centralisation trend is very much likely to induce a reduction in in-hospital delivery rates as well as an increase in maternal mortality, the centralisation-facilitated maternal health promotion at the potential sacrifice of spatial accessibility requires close attention.

CONCLUSION
Through the adoption of IV estimation, our study demonstrated the causal effect of delivery volume on SMM. As suggested by IV-logistic estimation, each increase of 1000 cases in the delivery volume would lead to the reduction of 16.2% and 99.5% in the rates and odds of SMM, respectively. From a long-term perspective, centralisation of obstetric delivery services is very much likely to result in an increased number of high-volume healthcare institutions as well as improved maternal health outcomes. Under such context, the impact of such centralisation on pregnant women's accessibility of medical services requires close attention.