The predictive power of health system environments: a novel approach for explaining inequalities in access to maternal healthcare

Introduction The growing use of Geographic Information Systems (GIS) to link population-level data to health facility data is key for the inclusion of health system environments in analyses of health disparities. However, such approaches commonly focus on just a couple of aspects of the health system environment and only report on the average and independent effect of each dimension. Methods Using GIS to link Demographic and Health Survey data on births (2008–13/14) to Service Availability and Readiness Assessment data on health facilities (2010) in Zambia, this paper rigorously measures the multiple dimensions of an accessible health system environment. Using multilevel Bayesian methods (multilevel analysis of individual heterogeneity and discriminatory accuracy), it investigates whether multidimensional health system environments defined with reference to both geographic and social location cut across individual-level and community-level heterogeneity to reliably predict facility delivery. Results Random intercepts representing different health system environments have an intraclass correlation coefficient of 25%, which demonstrates high levels of discriminatory accuracy. Health system environments with four or more access barriers are particularly likely to predict lower than average access to facility delivery. Including barriers related to geographic location in the non-random part of the model results in a proportional change in variance of 74% relative to only 27% for barriers related to social discrimination. Conclusions Health system environments defined as a combination of geographic and social location can effectively distinguish between population groups with high versus low probabilities of access. Barriers related to geographic location appear more important than social discrimination in the context of Zambian maternal healthcare access. Under a progressive universalism approach, resources should be disproportionately invested in the worst health system environments.


Sample selection
In the DHS dataset, births to mothers who migrated since the birth were excluded, as their residence at the time of the birth could not be obtained (21,034 excluded out of an original sample of 49,207). Nonsingleton births were excluded since they constitute a medical complication that is often identified prior to the birth, such that the determinants of access to care in childbirth are fundamentally different to nonsingleton births (496 excluded out of 28,173). Births that occurred prior to 2008 were excluded, as the location of the birth was not recorded in the survey (16,392 excluded out of 27,677). Births that did not have a valid geo-reference were excluded (two sampling clusters and 45 births out of 11,285). Births that were not located in one of the 17 SARA districts were excluded (466 sampling clusters and 7,671 births out of 11,240). Finally, observations with missing values on covariates were excluded (99 out of 3569), leaving a final sample of 3,470 observations.
In the SARA dataset, originally composed of 658 facilities, 17 facilities were dropped due to having no or incorrect geo-references and 45 were excluded due to being identified as located outside of the SARA districts' shapefiles through GIS analysis. The final sample was made up of 596 facilities.

Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy (MAIHDA)
MAIHDA is implemented using a logistic random intercepts model (Equation 1):

Equation 1: Baseline logistic random intercept model
Where: yijz is facility delivery for the i th birth nested in both the j th community (i.e.: Demographic and Health Survey sampling clusters) and the z th health system environment. α is the overall mean of facility delivery. θ is a vector of control variables.
The two sets of random intercepts μ 1j and μ 2j are assumed to be normally distributed with mean zero and uncorrelated with each other. The community random intercepts μ 1j have variance 1 , are independent and identically distributed across communities, and are independent from control variables θijz. The health system environment random intercepts μ 2z have variance 2 , are independent and identically distributed across health system environments, and are independent from control variables θijz . [1] Using such a model, the predicted probability of a facility delivery can be estimated for each health system environment. These probabilities are more reliably estimated in a multi-level model than a saturated fixed-effects model, since probabilities for rare combinations are estimated by borrowing information from the mean. [2] The health system environments' ICC is calculated as the share of the variance attributable to the health system environment random intercepts' variance , 2 , relative to the total variance, made up of the health system environment random intercepts' variance 2 , the community random intercepts' variance 1 , and the individual-level variance, which is set at 3.29 in binomial logistic models (Equation 2). The ICC measures the level of discriminatory accuracy, similar to the Area Under the Receiver operating characteristic curve (AUC). [3] The higher the ICC, the better the barrier combinations are at distinguishing between who will and will not access a facility delivery.
In subsequent models, I explore which dimensions have the most discriminatory accuracy by comparing the ICC of the health system environments' random intercepts in Equation 1 (model A) versus the ICC of the same random intercepts in a model that also includes barrier variable dummies as main effects (model B). This is calculated using the Proportional Change in Variance (Equation 3). [4]

Markov Chain Monte Carlo estimation
Estimates were generated using a Gibbs sampler, Rjags, from within RStudio v1.0.143. Noninformative priors, 5,000 iteration burn-in and 100,000 saved posterior samples were used. No initialisation values were used, but chains with different random starting points gave similar results, and traceplots indicated good levels of convergence and mixing. The Raftery-Lewis diagnostic indicated an appropriate number of burn-in and saved samples in order to obtain the parameters of interest with a 0.005 margin of error at the 0.025 quartile with 95% accuracy.
Point estimates are the average of the posterior samples for the parameter of interest, while uncertainty is communicated through the credible intervals (CI), the smallest interval covering 95% of posterior samples for the parameter of interest. Predicted probabilities are estimated by calculating the logged odds for each health system environment in each posterior sample using the parameters estimated in the Bayesian model described above, converting logged odds to probabilities, and averaging across posterior samples for each health system environment in order to obtain the point estimate.