Article Text

Understanding efficiency and the effect of pay-for-performance across health facilities in Tanzania
1. Peter Binyaruka1,
2. Laura Anselmi2
1. 1Health System, Impact Evaluation and Policy, Ifakara Health Institute, Dar es Salaam, Tanzania
2. 2Health Organisation, Policy and Economics, Centre for Primary Care and Health Services Research, University of Manchester, Manchester, UK
1. Correspondence to Dr Peter Binyaruka, Ifakara Health Institute, Dar es Salaam, Tanzania; pbinyaruka{at}ihi.or.tz

## Abstract

Background Ensuring efficient use and allocation of limited resources is crucial to achieving the UHC goal. Performance-based financing that provides financial incentives for health providers reaching predefined targets would be expected to enhance technical efficiency across facilities by promoting an output-oriented payment system. However, there is no study which has systematically assessed efficiency scores across facilities before and after the introduction of pay-for-performance (P4P). This paper seeks to fill this knowledge gap.

Methods We used data of P4P evaluation related to healthcare inputs (staff, equipment, medicines) and outputs (outpatient consultations and institutional deliveries) from 75 health facilities implementing P4P in Pwani region, and 75 from comparison districts in Tanzania. We measured technical efficiency using Data Envelopment Analysis and obtained efficiency scores across facilities before and after P4P scheme. We analysed which factors influence technical efficiency by regressing the efficiency scores over a number of contextual factors. We also tested the impact of P4P on efficiency through a difference-in-differences regression analysis.

Results The overall technical efficiency scores ranged between 0.40 and 0.65 for hospitals and health centres, and around 0.20 for dispensaries. Only 21% of hospitals and health centres were efficient when outpatient consultations and deliveries were considered as output, and <3% out of all facilities were efficient when outpatient consultations only were considered as outputs. Higher efficiency scores were significantly associated with the level of care (hospital and health centre) and wealthier catchment populations. Despite no evidence of P4P effect on efficiency on average, P4P might have improved efficiency marginally among public facilities.

Conclusion Most facilities were not operating at their full capacity indicating potential for improving resource usage. A better understanding of the production process at the facility level and of how different healthcare financing reforms affects efficiency is needed. Effective reforms should improve inputs, outputs but also efficiency.

• health economics
• health policy
• health services research
• health systems
• health systems evaluation
http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

## Statistics from Altmetric.com

### Key questions

#### What is already known?

• There is a room for increasing efficiency in resource use across health facilities in low/middle-income countries.

• Healthcare financing reforms have the potential to stimulate efficiency in resource use including those focusing on output-based financing (eg, pay-for-performance (P4P)).

#### What are the new findings?

• The technical efficiency score from 150 facilities in Tanzania ranged between 40% and 65% for hospitals and health centres, and lowest around 20% for dispensaries.

• Higher efficiency scores were significantly associated with the level of care provided (hospital and health centre) and wealthier catchment populations.

• There is no evidence that P4P improved efficiency on average, but might marginally improved among public facilities only.

#### What do the new findings imply?

• Facilities can still improve efficiency, but the service delivery processes at the health facility level and the effect of incentive schemes need to be better understood to determine how to improve efficiency.

• Large share of technically inefficient facilities suggests that most facilities (especially dispensaries) are not operating in their full capacity or significantly wasting resources, perhaps due to poor healthcare seeking culture.

## Introduction

Low/middle-income countries (LMICs), especially in sub-Saharan Africa, are constrained in achieving universal health coverage (UHC) by limited resources,1–3 double burden of diseases (communicable and non-communicable diseases)4 5 and relatively poor health system performance.6 7 Despite the efforts to ensure more funding are allocated to the health sector, resources are still limited and ensuring they are used efficiently and equitably is crucial to achieve UHC goals.8 In 2010, the World Health Report estimated about 20%–40% of health resources are being wasted globally due to inefficient and inequitable use of resources.1 9 Indeed, countries’ health systems need not only more resources for health but also efficient use of available resources. Efficiency can either be technical efficiency (minimum amount of resources used for a given level of output or maximum amount of output produced for a given level of resources) or allocative efficiency (how resource inputs and their prices are combined to produce a mix of different outputs).9

Pay-for-performance (P4P) schemes, which provide financial incentives for health workers and/or facilities reaching predefined targets, are gaining popularity in LMICs as a means of improving facility performance and health system strengthening.10–13 Although there is a variation across schemes, P4P incentives are typically rewarded to facilities or healthcare providers after the achievement of quantity and quality targets. These payments are used for facility improvement as well as additional bonus to staff salary.14 15 So, while incentivising output improvement, bonus payments to facilities can also contribute to the improvement of infrastructure and to drugs and supplies availability, among others. Some P4P schemes also offer an initial payment to facilities enrolled in the scheme to expand their capacity for service provision. Since P4P is an output-oriented financing strategy aimed at increasing the provision of targeted services, one would expect P4P to enhance efficiency in service delivery. For instance, P4P can improve allocative efficiency by incentivising an optimal mix of services that maximises health improvements, and it can improve technical efficiency by increasing the quantity and quality of services produced for a given amount of inputs.11

There is a growing body of evidence evaluating the impact of P4P in LMICs. However, most of these studies have focused on assessing the effects of P4P on service coverage,12 16 quality of care17, equity in service delivery and resource distribution,18–20 and on facility inputs, such as human resources21–23 and medical commodities.24 25 Many P4P evaluations indicate mixed results, with notable increases of incentivised service delivery indicators. The current literature, however, shows relatively little attention to efficiency which combines inputs and outputs. The little available evidence shows that P4P increased provider productivity in Rwanda,26 increased efficiency of faith-based facilities in Uganda,27 and efficiency improved over time among P4P implementing health centres in Rwanda.28 However, these studies focused on a single type of facilities (eg, faith-based or health centres only27 28), used only one input and output26 or relied on a small sample of facilities and assessed trend over time without comparing them with control facilities.28

We estimated the technical efficiency of public and private health facilities providing primary and/or secondary care, and we assessed the impact of a P4P scheme by comparing intervention and control facilities in Tanzania. Previous evaluations of P4P in Tanzania revealed an increase in service coverage (outputs) and in the availability of some healthcare inputs in the facilities where it was implemented.18 24 Assessing health facilities efficiency and how it changed under P4P is important and timely, as it adds to a limited evidence on the association between P4P and efficiency, which we suggest policy-makers should consider when making decisions with the implementation of P4P in low-resource settings like Tanzania.

### Study setting

This study took place in Tanzania, a low-income country in East Africa. Most facilities (70%) in Tanzania are government owned. The public health system has a hierarchical administration and is organised in a referral structure, with dispensaries and health centres providing primary care and hospitals providing secondary and tertiary care.29 The health system is decentralised, whereby local managers are given power to plan and manage resources.30 Like in other LMICs, most health facilities in Tanzania are facing shortage of staff,31 drugs and supplies.32 33 Health financing is highly fragmented with many sources including general taxation (26%), donor support (41%), out-of-pocket payments (23%) and health insurance contributions (8%).34 35 About 32% of Tanzanians are covered by health insurance—ie, 8% as public servants mainly through National Health Insurance Fund, 23% as informal workers through Community Health Fund (CHF) and 1% from private insurance.36 In 2018/2019, about 9% of total government expenditure or total budget was allocated for health (below the Abuja Declaration Target of 15%).36 Tanzania has introduced various healthcare financing reforms in an effort to move toward UHC, including fiscal decentralisation through direct health facility financing,37 38 improved CHF39 and national roll-out of P4P, also called results-based financing programme.40

In January 2011, Tanzania introduced a P4P pilot scheme in Pwani region before national roll-out, with the aim of improving maternal and child health (MCH) services.41 All health facilities providing MCH services in the region were eligible to implement the scheme. P4P performance targets for facilities were either related related to the coverage of specific services (eg, institutional delivery) or to content of care (eg, uptake of antimalarials during antenatal care) as described in more detail elsewhere.18 41 Health managers were also eligible to receive performance payouts based on the performance of the facilities in their district or region. Performance data were compiled by facilities and verified every 6 months (one cycle) before P4P payments. The maximum payout if all targets were fully attained was US$820 per cycle for dispensaries, US$3220 for health centres and US$6790 for hospitals. P4P payments were additional to funding for operational costs and salaries which are unrelated to performance. P4P payouts at the facility-level included bonuses to staff (equivalent to 10% of their monthly salary) and funds that could be used for facility improvement or demand creation initiatives (10% of the total in hospitals and 25% in lower level facilities). District and regional managers received bonus payments of up to US$3000 per cycle.

The impact evaluation of P4P pilot showed a significant positive effect on two out of eight incentivised service indicators: institutional delivery rate and provision of antimalarial during antenatal care.18 The evaluation also documented various positive process changes, such as increased availability of drugs and supplies, increased supportive supervision, a reduced chance of patients paying user fees and greater provider kindness during institutional delivery.18 24 42 43

## Methods

### Data sources

Data were derived from two rounds of a repeated cross-sectional facility survey done in 150 health facilities from 11 districts in Pwani, Morogoro and Lind region, as part of an impact evaluation of the P4P pilot programme in Pwani region.18 41 In Pwani region, all 6 hospitals and 16 health centres were eligible for inclusion, and a random sample of 53 dispensaries were included in the evaluation study. An equivalent number of facilities were randomly sampled by level of care in comparison districts from Morogoro and Lindi region. As a result, an equal number of facilities in the intervention and control study arms were included—ie, 75 facilities from Pwani region (P4P arm) and 75 facilities from Morogoro and Lindi (comparison arm). Further details on the sampling strategy of facilities are presented elsewhere .41

### Data collection

Data were collected through a facility survey questionnaire adapted from the World Bank Impact Evaluation Toolkit.44 The questionnaire had three main sections: (1) basic facility characteristics and management (staffing levels, opening hours, facility management and infrastructure), (2) availability of drugs, supplies and equipment and (3) monthly service utilisation data from facility register books. Trained enumerators administered a structured facility questionnaire to the facility in-charge or any knowledgeable health worker at each facility. The enumerators also measured inputs and contextual factors during data collection, while outputs (monthly utilisation data) were extracted from facility register books for 12 months prior to the interview. Data were collected in two rounds: January and February 2012 and 13 months later, reflecting before and after the introduction of P4P. Each round surveyed 150 facilities including 12 hospitals, 32 health centres and 106 dispensaries. Data collected have been used previously.19 24 42 43

### Input, output and contextual factor variables

This study used three healthcare inputs and two outputs to estimate efficiency scores across health facilities (table 1). Labour (clinical staff by cadre), capital (proxied by the number of beds) and medical commodities were included as key healthcare inputs to the production process for health facilities. Healthcare outputs included volume of health services utilisation, as intermediate outputs, since health status are difficult and costly to measure.45 As data were incomplete for some facilities, we imputed the number of maternity beds in seven facilities that missed in 2012 with the number of beds in 2013, assuming nothing has changed over time; and similarly, for seven facilities that provided inpatient services, but recorded zero bed in 2012. We replaced the missing values of outputs in 20 facilities with those from the previous or next year . Monthly mean numbers of outpatient visits and normal deliveries were computed annually, by summing consultations over the year and divide by 12 months. We also used data on a set of contextual indicators: facility ownership status, facility level of care, location (rural/urban), catchment population characteristics, availability of outreach services, availability of CHF insurance, distance from district headquarter and frequency of external supervision. The choice of inputs, outputs and contextual factors was guided by contextual consideration and previous efficiency studies in low-income settings, conditional to data availability.46

Table 1

Definition and measurement of health facility input, output and contextual variables

### Measuring efficiency: data envelopment analysis (DEA)

Efficiency analysis methods are based on the estimation of a production frontier, which represents the maximum output that a decision-making unit, health facilities in this study, can produce for a given input combination. Efficiency scores ranging from 0, completely inefficient, to 1, efficient, indicate for each decision-making unit the proportion of output produced with respect to a hypothetical identical one producing on the frontier (ie, efficient). Generally, two types of methods are used to estimate the frontier: stochastic frontier analysis, based on parametric techniques, and DEA, based on non-parametric techniques.9 45 47–49 Both have been widely used in the literature and present advantages and disadvantages. The choice ultimately relies on the analyst judgement based on the research question, study setting and data available.9 45–48 Thus, we used a DEA because of methodological and practical advantages. DEA accommodates multiple inputs and multiple outputs and it is a data-driven approach, which allow us to avoid specific assumptions about the functional form of the production function and the distribution of the error term. As such, DEA is also used for ‘benchmarking’ where efficient units may not necessarily form a production frontier, but instead a ‘best-practice frontier’.45

In this study, we assumed alternatively both constant returns to scale (CRS), that is, all units are operating at their optimal scale,50 and variable returns to scale (VRS).51 Through VRS assumption, we were able to decompose the overall technical efficiency scores into scale efficiency and pure technical efficiency. We considered an output-oriented approach due to the hierarchical structure of the health system and the centralised procurement process, such that most public health facilities in low-income settings have fixed set of inputs at any given time with limited influence on their inputs. However, health facilities have the autonomy to use limited or fixed resources in an innovative way to maximise outputs. We estimated technical efficiency for hospitals, health centres and dispensaries separately in 2012 and 2013, as inputs, outputs and healthcare delivery may be very different in each level of care. Allocative efficiency could not be considered in this paper due to the lack of reliable data on input prices. Moreover, due to the centralised procurement, the prices of staff and large equipment would not vary across facilities. We estimated DEA scores by using two models with different input and output combinations to test the relative importance of each input or output. Model 1 included four inputs (staff level, number of drugs and vaccines, medical supplies and equipment) and one output (outpatient visits) for all facilities. Model 2 included five inputs (staff level, number of drugs and vaccines, medical supplies, equipment and maternity beds) and two outputs (outpatient visits and normal deliveries) for hospital and health centres only. Dispensaries were not included in model 2 as they are not equipped to provide institutional deliveries. We estimated efficiency scores in each model by benchmarking health facilities of the same type. We assummed CRS first then VRS. All health facilities with efficiency score of 1 were considered efficient, and thus we computed the percentage of efficient health facilities and we examined how that changes over time. Efficiency scores were computed in Stata by applying the DEA user-written command, with alternatively CRS and VRS, as required by each model, and with output orientation. Details are presented in online supplementary appendix.

### Determinants of efficiency: regression analyses

To identify the determinants of technical efficiency, we regressed the efficiency scores over observed contextual factors. The efficiency scores are censored (0 to 1) and the correlation pattern among DEA efficiency scores is typically complex and unknown. We therefore used the approach developed by Simar and Wilson52 and recently applied by Moreno-Serra et al,53 as it accounts for bounded dependent variable, corrects the SEs by simulating the unknown error correlation among efficiency scores and calculates bootstrapped SEs.54

We pooled the efficiency scores estimated separately for dispensaries, health centres and hospitals in 2012 and in 2013 and we regressed them on a set of contextual and environmental factors (listed in table 1). We also included binary indicators to control for type of health facility, year and district. We used Stata simarwilson with externally estimated efficiency scores and 2000 replications. In model 1 we included dispensaries, health centres and hospitals, and in model 2 we included only health centre and hospitals. Details are presented in online supplementary appendix.

### Effect of P4P on efficiency: difference-in-differences analysis

To test whether P4P affects efficiency scores, we first compared descriptively the changes in efficiency scores over time for P4P and non-P4P facilities. We then applied a linear difference-in-differences regression analysis based on a controlled before and after design.

We regressed efficiency scores pooled across health facilities and years over a dummy variable, taking the value 1 if a facility is exposed to P4P in 2013 and 0 if not. We controlled for time invariant determinants including facility fixed effects and for time-specific effects including year fixed effects. As average programme effect may mask important heterogeneous effects,55 we further attempted to assess the heterogeneous effect of P4P by facility type, by including in the regression the interaction term between P4P in 2013 and the category considered for heterogeneity, along with the category itself. Details are presented in online supplementary appendix. As in the analysis of the determinants of efficiency, we applied the Simar and Wilson52 approach.

The effect of P4P on efficiency can be interpreted as causal on the assumption that trends in efficiency scores across facilities in intervention and comparison sites were parallel before the start of the programme. We were unable to formally test this assumption due to the lack of multiple observations of inputs and outputs, and hence efficiency scores, before the introduction of P4P. However, trends in facility outputs (eg, service utilisation rates) were parallel prior to the intervention.18 43 As trends in inputs were also likely to be parallel, due to centralised planning and procurement, we argue that the evidence lends some support to the assumption. All analyses were performed using STATA V.15.

### Patient and public involvement

We used secondary data from the facility survey of the evaluation of P4P in Tanzania. Our data do not involve patient, but our findings will be shared with health system stakeholders.

## Results

### Descriptive statistics

Table 2 presents descriptive statistics of baseline and endline characteristics of facilities including the levels of inputs and outputs across study arms. The baseline levels of outputs and inputs were generally similar between intervention and comparison arms, although intervention facilities had marginally higher level of medical staff (especially paramedics) and lower availability of drugs and vaccines than comparison facilities. The other facility characteristics were fairly balanced between arms. However, most of the intervention facilities had CHF scheme and served poorer population than their counterpart facilities in comparison arm. Looking at changes over time, the availability of beds, of drugs and vaccines (in the comparison group) and of medical supplies slightly decreased. Outpatient consultations decreased in all facilities, while deliveries increased in P4P facilities and decreased in the control ones.

Table 2

Characteristics of health facilities over time by P4P arms

### Efficiency scores

The technical efficiency scores calculated assuming CRS varied by facility type, both in model 1, which included four inputs and one output (outpatient visits) and in model 2, which included an additional input, maternity beds, and an additional output, normal deliveries. The average overall technical efficiency for hospitals and health centres was 0.46 and 0.40 in model 1, and 0.65 and 0.60 in model 2, respectively (table 3 and online supplementary appendix table 1). Dispensaries had relatively low average efficiency scores of 0.20, which contributed to lower the average efficiency scores across all health facilities in model 1. Model 1 resulted into an overall relatively smaller percentage of technically efficient facilities (2.4%) compared with model 2 (21.3%) (table 3). Since in model 1, the average technical efficiency score was 0.46, 0.40 and 0.20 for hospitals, health centres and dispensaries respectively, this suggests that, on average, they could respectively increase their outpatient visits by about 54%, 60% and 80% without reducing inputs.

Table 3

Summary of overall technical efficiency scores and share of efficient facilities

Over time, the average technical efficiency scores of health centres and dispensaries decreased, as measured in model 1, while hospitals efficiency scores increased from 0.45 to 0.47 (table 3). Consistently, the proportion of efficient health centres (6.3%) and dispensaries (0.9%) in 2012 reduced to 0% for both in 2013. The proportion of efficient hospitals remained efficient throughout (16.7%). In model 2, overall technical efficiency score and share of efficient facilities slightly improved over time for hospitals and health centres. Furthermore, the overall technical efficiency scores were estimated ssuming VRS assumption, in order to allow a decomposition into pure technical efficiency and scale efficiency. The average pure technical efficiency and scale efficiency remained almost similar in both models from 2012 to 2013, except for a slight increase for hospitals and decrease for health centres over time (online supplementary appendix table 2).

### Determinants of efficiency

Table 4 presents results on factors associated with overall technical efficiency. Efficiency scores estimated in model 1 were significantly higher by 0.281 (CI: 0.118 to 0.407) for hospitals and by 0.294 (CI: 0.180 to 0.380) for health centres, compared with dispensaries. Nevertheless, efficiency scores estimated in model 2 were significantly higher only for facilities with a relatively wealthier catchment population (0.266; CI: 0.036 to 0.483). These variations were observed over and above variations across districts. The other variables were not significantly associated with the efficiency scores.

Table 4

Determinants of efficiency

### Effect of P4P on efficiency

Results from the difference-in-differences regression presented in table 5 show that P4P was negatively but not significantly associated with technical efficiency in model 1 (−0.024; CI: −0.058 to 0.010) and positively but not significantly associated with efficiency in model 2 (0.044; CI: −0.039 to 0.125). P4P might have improved marginally the efficiency scores in public facilities by 0.060 (CI: −0.013 to 0.126) while it reduced it in non-public facilities by −0.076 (CI: −0.140 to −0.003) in Model 1. There was a positive but not significant increase of efficiency in both types of facilities in model 2. There was no evidence of significant differential effect of P4P for urban facilities, dispensaries or facilities with a poor catchment population in terms of socioeconomic status.

Table 5

Estimated average and heterogeneous P4P effect

## Discussion

This study examined the effect of P4P on efficiency across health facilities in Tanzania. To the best of our knowledge, this is the first study to use multiple inputs and outputs to measure efficiency and the effect of P4P on efficiency by comparing P4P and non-P4P health facilities and by differenting by facility characteristics (ie, level of care and ownership). We found overall technical efficiency scores ranging between 0.40 and 0.65 for hospitals and health centres, and lowest around 0.20 for dispensaries. Only 21% of hospitals and health centres were technically efficient when two outputs were considered, and less than 3% of all facilities (including dispensaries) were efficient when only one output was considered. In terms of determinants, higher efficiency scores were significantly associated with the level of care (hospital and health centre), and with wealthier catchment populations for hospitals and health centres only. There were also statistically significant variations in facilities’ efficiency across districts, but not over time. While there was no evidence of P4P effect on efficiency on average, P4P might have improved marginally the efficiency of public facilities.

The finding that majority of facilities were technically inefficient suggest that most health facilities (especially dispensaries) are not operating in their full capacity or are not on their efficiency frontier (ie, significant resources are being wasted). These inefficient facilities are either using inputs unnecessarily for production, which is unlikely given the low level of inputs available, or could increase service delivery with the available resources. In fact, the low technical efficiency observed is consistent with findings from other studies in sub-Saharan Africa,56–62 reporting efficiency scores typically between 0.40 and 0.75 with less than 50% of facilities being efficient. Few studies report efficiency scores above 0.75.63–66 While most studies in the region focused either on hospitals or health centres, we included all facility types, which, especially after including dispensaries, led to lower overall efficiency scores.

Higher levels of care (eg, hospitals and health centres) and for those with wealthier catchment population were significantly associated with higher efficiency levels in Tanzania. Low demand of services at facilities providing only basic primary healthcare services64 and patients bypassing primary healthcare facilities67–69 may explain the association between higher level of care and higher efficiency. Indeed, more outreach visits were also associated, although not statistically significantly, with higher efficiency scores, which reflects the potential of outreach services to improve demand and utilisation of health services.70 71 The efficiency scores did not differ significantly between facilities in rural and urban districts, which is consistent with findings of a study from Zambia,57 but not with findings from Ghana.56 Characteristics of the catchment population can affect efficiency. Our study revealed that wealthier catchment populations were associated with higher efficiency levels for hospitals and health centres, when institutional deliveries were included in the output. However, according to studies in Ethiopia larger catchment populations were associated with higher efficiency levels.66 72 Although we found no variation in efficiency across ownership status, other studies have found either public facilities56 or private facilities73 to be more efficient than their counterparts in LMICs. In addition, a study that re-estimated country level efficiency on WHO panel data49 revealed that public facilities could potentially be more efficient in health service delivery than their counterparts. Variations in efficiency across districts suggest that there may be district level factors still unexplored affecting efficiency, beyond rurality and catchment population characteristics.74

The implementation of the P4P programme in Tanzania neither enhanced nor reduced efficiency across health facilities. The average efficiency scores remained almost unchanged over time. However, there was a slight increase in efficiency scores in intervention facilities and a slight decrease in control facilities, which may have neutralised the effect. The lack of effect might also be due to the evaluation time frame of 1 year, which might be shorter to observe an effect on efficiency. Our finding of no P4P effect is in contrast to other three studies in SSA assessing P4P and efficiency. For example, Gertler and Vermeersch,26 considering only one input and output, found the productivity of health workers on prenatal care improved in Rwanda after the introduction of P4P. Another study in Rwanda reported some changes in efficiency among P4P health centres;28 however, they used a relatively small sample of 26 health centres and simply compared changes in efficiency over time without comparison facilities. In Uganda, evidence shows that efficiency increased among facilities implementing P4P,27 but the analysis was limited to faith-based facilities only. To the best of our knowledge, ours is the only study which systematically used comparison facilities and combined facilities of different levels of care and ownership status to identify the effect of P4P on efficiency. Since the use of different set of variables generates different conclusions, we suggest that future studies should use multiple inputs and outputs and different facility type when assessing efficiency especially in the context of healthcare financing reforms.

Despite the lack of average effect of P4P on efficiency, we identified that P4P might have increased marginally the efficiency scores among public facilities and reduced efficiency in non-public facilities. This finding show that important heterogeneous effects may often be hidden when focusing on average effect only. These results may be driven by increased demand for service in public rather than non-public facilities. Public facilities in Tanzania are better able to respond to incentives as they can offer free MCH services (under the fee exemption policy) and have more financial autonomy than non-public providers like faith-based facilities.42 It is important to note that while we assess the effect that P4P may have on the efficiency with which health facilities operate, results do not say anything about cost-effectiveness of P4P itself, as the many costs of this reform are not accounted for.75 76

Our study revealed that only less than a quarter of facilities were efficient. This indicates that only few facilities were more successful in converting inputs into outputs than others, despite being in a similar resource-constrained context. Additionally, it implies that the majority of facilities (especially primary healthcare facilities) can still increase their outputs with available levels of inputs to operate at the efficiency frontier. Facilities wishing to increase their level of outputs and eventually efficiency, may possibly need to increase the demand of health services. This can be achieved through different strategies including the expansion of health insurance coverage or voucher schemes which provide financial protection and reduce access barriers,1 77 as well as through strategies aiming at improving quality of care, which may not only attract clients but also reduce the practice of bypassing primary healthcare facilities.78 More targeted efforts and interventions toward some facilities and location are also needed,79 80 since the inefficiencies were more prevalent among primary health facilities which are largely serving poorer populations. Furthermore, there is a need for more research to assess efficiency of primary healthcare facilities in LMICs. These assessments are currently very limited,46 but primary healthcare facilities are the first point of contact and serves the majority of the population.81 82 Assessing the role of local health administrations, which in similar settings often manage the distribution of resources across facilities, would also be important to understand where improvements can be made.74 Moreover, we encourage more research to assess whether performance-based provider payments (eg, P4P) can improve efficiency across facilities, as countries continue to implement health financing reforms toward UHC. In LMICs where staff and drugs are in shortage, it is important to improve efficiency by increasing output while maintaining or increasing inputs. Finally, we encourage the evaluation of reforms such as P4P in a comprehensive manner, in terms of changes in inputs, outputs and efficiency at the provider level, but also cost-effectiveness and efficiency of the reform itself.83

This study has some limitations. First, our focus has been on technical efficiency and scale efficiency, as opposite to allocative efficiency due to the lack of data on input prices. However, this would have not been relevant in this study given the context of centralised purchasing and allocation of healthcare inputs. Second, as in the majority of efficiency studies in similar settings,46 the outputs considered were intermediate outcomes, which are simply proxy measures of health outcomes, so that the interpretation of results relies on the assumption that reduction in service delivery was not driven by improvements in health. We were limited in the measurement of service delivery (not including inpatient use nor outreach activities) and recurrent expenditure as input especially in facilities benefiting P4P bonus. Moreover, quality is not explicitly accounted for. Third, because our study relied on DEA due to its advantages over other methods, it reflects only relative efficiency as opposed to overall performance. Thus, a high DEA score does not necessarily mean facilities are well managed with limited scope for improvement,64 as absolute efficiency may be low. Fourth, although the assumption of parallel trends for facility outcomes prior to the intervention was confirmed, we were unable to formally test such assumption for healthcare inputs and efficiency itself due to the lack of historical data on inputs.

## Conclusion

This study revealed inefficiencies in service outputs with available resources across facilities, and even more prevalent among primary healthcare facilities. Inefficiencies were also less likely among health facilities which were serving wealthier populations. To address the inefficiencies, efforts are needed especially toward increasing service outputs (healthcare demand) with available inputs. P4P programme had no effect on efficiency across facilities on average, but might have marginally improved efficiency among public facilities than their counterparts, perhaps due to contextual setting. These findings still suggest a need for further assessment of the long-term effect. Although the inadequate resources are important constraint among health facilities in LMICs, the efficiency in resource utilisation is another challenge that needs attention from policy-makers, health managers and researchers.

## Acknowledgments

We would like to thank all healthcare providers, health managers, and national officers that supported either the overall evaluation of P4P at the facility level, or in the planning of the evaluation. We also thank the whole P4P evaluation research team, including data collectors and field coordinators.

## Footnotes

• Handling editor Valery Ridde

• Twitter @peter_binyaruka

• Contributors PB conceptualised this substudy and oversaw data collection. PB wrote the first draft of the manuscript. LA contributed to the conceptualisation of the study and together with PB involved in data analysis, interpretation, presentation and revision of the manuscript. All authors read and approved the final manuscript.

• Funding The Government of Norway funded the data collection for the programme evaluation that was used in this paper (grant numbers: TAN-3108 and TAN 13/0005. http://www.norad.no/en/). The funding partially supported the first author during data analysis and writing of this paper. The UKRI University of Manchester (UoM) Research England Global Challenges Research Fund (GCRF) QA Allocation partially funded the first author and funded a visiting period to start the conceptualisation and data analysis of this paper. The funding bodies had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

• Competing interests None declared.

• Patient and public involvement Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.

• Patient consent for publication Not required.

• Ethics approval The evaluation study received ethical approval from the Ifakara Health Institute institutional

review board (approval number: 1BI1IRB/38) and the ethics committee of the London School of Hygiene & Tropical Medicine. Study participants provided written consent to participate in this study, requiring them to sign a written consent form that was read out to them by the

interviewers. This consent form was reviewed and approved by the ethics committees prior to

the start of the research.

• Provenance and peer review Not commissioned; externally peer reviewed.

• Data availability statement The data have been uploaded into a data repository. The DOI URL for the dataset is https://zenodo.org/record/21709%23.

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.