Article Text

Long-term effects of payment for performance on maternal and child health outcomes: evidence from Tanzania
1. Josephine Borghi1,
2. Peter Binyaruka2,3,
3. Iddy Mayumana4,
4. Siri Lange3,5,
5. Vincent Somville3,6,
1. 1Department of Global Health and Development, London School of Hygiene & Tropical Medicine, London, UK
2. 2Ifakara Health Institute, Dar es Salaam, Tanzania, United Republic of
3. 3Chr Michelsen Institute, Bergen, Norway
4. 4Ifakara Health Institute, Ifakara, Morogoro, Tanzania, United Republic of
5. 5Department of Health Promotion and Development, University of Bergen, Bergen, Hordaland, Norway
6. 6NHH Norwegian School of Economics, Bergen, Norway
1. Correspondence to Dr Josephine Borghi; jo.borghi{at}lshtm.ac.uk

## Abstract

Background The success of payment for performance (P4P) schemes relies on their ability to generate sustainable changes in the behaviour of healthcare providers. This paper examines short-term and longer-term effects of P4P in Tanzania and the reasons for these changes.

Methods We conducted a controlled before and after study and an embedded process evaluation. Three rounds of facility, patient and household survey data (at baseline, after 13 months and at 36 months) measured programme effects in seven intervention districts and four comparison districts. We used linear difference-in-difference regression analysis to determine programme effects, and differential effects over time. Four rounds of qualitative data examined evolution in programme design, implementation and mechanisms of change.

Results Programme effects on the rate of institutional deliveries and antimalarial treatment during antenatal care reduced overtime, with stock out rates of antimalarials increasing over time to baseline levels. P4P led to sustained improvements in kindness during deliveries, with a wider set of improvements in patient experience of care in the longer term. A change in programme management and funding delayed incentive payments affecting performance on some indicators. The verification system became more integrated within routine systems over time, reducing the time burden on managers and health workers. Ongoing financial autonomy and supervision sustained motivational effects in those aspects of care giving not reliant on funding.

Conclusion Our study adds to limited and mixed evidence documenting how P4P effects evolve over time. Our findings highlight the importance of undertaking ongoing assessment of effects over time.

• maternal health
• health economics
• health systems evaluation

## Data availability statement

Data are available in a public, open access repository. The quantitative data for this paper are made available through Zenodo, DOI: 10.5281/zenodo.5636645, url: https://zenodo.org/record/5636646%23.YanUmtnMK3I.

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

### Key questions

• An increasing number of studies have examined the impact of payment for performance (P4P) schemes at one point in time, reporting positive effects on some targeted outcomes.

• Evidence from high-income settings suggests P4P effects diminish over time, but effects are more likely to be sustained in low-performance areas.

• Evidence from low-income settings is limited and mixed.

• We know little about why effects change over time, though there are varying hypotheses as to how and why they might change.

#### What are the new findings?

• The effects of the programme on the rate of institutional deliveries and antimalaria treatment during antenatal care reduced overtime, with stock out rates of intermittent presumptive treatment increasing over time to baseline levels after an initial reduction.

• There was evidence of sustained improvements in kindness during deliveries, and indications of a wider set of improvements in patient experience of delivery care in the longer term.

• It took time for health workers to fully understand and grasp the programme and the verification system became more integrated within routine systems over time, reducing the time impact of the programme on managers and health workers.

#### What do the new findings imply?

• Our findings highlight the importance of not just evaluating the effects of P4P at one point in time, but in undertaking ongoing assessment of effects over time.

• It is clearly important for evaluators to monitor changes in programme design and implementation and how this is related to outcomes, especially as schemes go from pilot to scale, and are taken over by government.

• Results demonstrate the limitations of conventional evaluations of cause and effect, and the need to embrace a complex adaptive systems approach to understanding health systems and their response to P4P.

## Introduction

Coverage and quality of essential and effective health services in low-income and middle-income countries (LMIC) remains inadequate, limiting gains in health outcomes.1 2 Over the last 10 years, many LMICs have introduced performance-based incentives to strengthen health systems and enhance the coverage and quality of health services.3 Payment for performance (P4P) schemes consist of payments to healthcare providers contingent on the improvement of predefined performance indicators, though their design varies substantially across settings.4 In low-income countries, part of the payment is paid directly to health workers, and part of the payment is paid to the facility for investment in improved service delivery, with healthcare managers often receiving payments based on the performance of health facilities under their jurisdiction.4

An increasing number of studies have examined the impact of P4P schemes at one point in time, reporting positive effects on some targeted outcomes.5–9 However, there has been less attention to documenting whether and how the effects of P4P programmes vary over time. This paper contributes to filling this gap by comparing the short-term (after 13 months) and longer-term (36 months) impact of P4P in Tanzania.

There are a number of reasons why the impacts of P4P may vary over time, with temporal responses depending on programme design and actor response to this. In schemes rewarding based on threshold targets, goal gradient theory suggests effort will increase as agents move closer to the goal,10 and cease once the threshold is reached,11 12 with multiple threshold targets being expected to encourage sustained effort.13 Like many complex interventions, the design of P4P programmes is not static, and adaptations are commonplace during implementation,14and can result in changing effects over time. Further, actor response to incentives may not be constant. When incentives are tied to tasks involving complex processes, experience and learning may be a prerequisite for improved performance—with actors taking time to understand the scheme, develop strategies and systems to improve performance.3 15 As a result, changes in behaviour may not be observed immediately.16 It may also take time for providers to develop trust that performance payments will be made, especially in fragile states with weak accountability systems.17 Finally, there may be a lag between patient recognition of enhanced health system responsiveness linked to incentives, and the adjustment of care-seeking behaviour within the community.18 In contrast, self-determination theory posits that P4P schemes may result in weaker effects over time, in so far as monetary rewards crowd out intrinsic work motivation.19 20 Equally, results may reduce over time due to reduced salience of the scheme as incentives become normalised.

Therefore, it is difficult to hypothesise the temporal variation in programme effects. Empirical studies are needed to improve our understanding of how effects vary over time in response to programme design, implementation and contextual factors.

Numerous studies from high-income countries with good routine health information systems have examined the effects of P4P over time. In the United States, the United Kingdom and Australia, 21 22,23 P4P effects were found to diminish over time, with suggestions that where baseline performance is lower there is more potential for longer-term effects to be sustained.23 Studies in Taiwan have found sustained effects on some services24 and sustained but reduced effects over time for others.25 In low-income settings, a few studies have examined how incentive effects vary over time, with mixed effects reported in Mozambique15 and Zimbabwe.26 However, these studies did not explore the reasons for changes in the effectiveness of P4P over time. Thus, there is a need for more evidence from LMICs to better understand the dynamic temporal effects of P4P schemes, and how and under which circumstances changes in effects occur.

This paper presents an extension of our previous evaluation of P4P in Tanzania after 13 months6 where we reported positive programme effects on two out of eight incentivised service indicators, and no effects on other indicators6; identified positive programme effects on the availability of drugs and medical supplies27; and considered the heterogeneity of effects across population28 and provider subgroups.29 Here, we consider the longer-term effects of the programme over an additional 23 months to examine whether there has been a broadening of effects over time, and an enhancement or reduction in initial achievements. In parallel, we consider whether there were changes in programme design, implementation and mechanisms that might explain variations in outcome over time.

### The P4P scheme

A P4P scheme was introduced in Pwani region in 2011 by the Ministry of Health and Social Welfare (MOHSW) with funding from the Norwegian Ministry of Foreign Affairs. The scheme provided financial payments to health facilities and district and regional health managers based on achievement of predefined targets for coverage of maternal and child health services (eg, institutional delivery; postnatal care (PNC) within 7 days of delivery) and content of care (eg, two doses of intermittent presumptive treatment (IPT) for malaria during antenatal care (ANC)) (table 1). The extensive use of service coverage indicators within the scheme distinguishes it from the fee-for-service schemes which are more commonly applied in other low-income settings. All except one of the eight incentivised service coverage indicators involved multiple thresholds based on performance in the previous cycle. One indicator (IPT provision during ANC) involved a single absolute threshold target. Performance was measured through the Health Management Information System (HMIS) every 6 months.

Table 1

Scheme design

Performance data were verified each 6-month cycle by national, regional and district stakeholders by comparing reported data to facility registers. The performance payments were shared between health workers (75% of the total) and the facility for investment in service delivery improvements (25%). The allocation of payments across health workers was at the discretion of the facility. To receive any payment, facilities had to meet at least 75% of the target, with 100% achievement being required for full payment. The maximum payout per cycle was US$820 for dispensaries; US$3220 for health centres; and US$6790 for hospitals. The health worker component is in the order of 10% of the salary for the maximum payout and average number of staff. At the district and regional level, managers were incentivised based on performance of facilities in their areas, together with drug availability and timely submission of HMIS reports, receiving up to US$3000 per cycle.

During the period 2011–2013 the implementation of P4P was supported by the Clinton Health Access Initiative (CHAI) who assisted in the calculation of payouts, participated in performance feedback meetings every cycle with district managers and healthcare workers and in data verification activities. From January 2014 Norwegian funding could no longer support bonus payments, with funding for CHAI ending in June 2014. Thereafter the MOHSW managed the scheme with the World Bank Health Innovation Trust Fund supporting bonus payments. However, agreement between the government of Tanzania and the World Bank was not finalised until March 2015, resulting in the delay of P4P payments for two cycles.30

## Methods

### Study design

This is a mixed method study which was guided by a theory of change including a quantitative impact assessment and a qualitative process evaluation. The impact assessment used a controlled before and after study design. Data for the impact assessment were collected at three points in time, just before the first incentive payments in January 2012, 13 months later (referred to as short term), and 36 months later (referred to as long term).32 The minimum time necessary to detect initial programme effect was deemed to be 13 months and 36 months was selected for the third round as it was just before the end of the pilot programme before its transition to a Results Based Financing scheme which was gradually rolled out nationally. Data were collected in all seven intervention districts and four comparison districts from neighbouring regions (Morogoro and Lindi) that were similar in relation to poverty and literacy rates, the rate of institutional deliveries, infant mortality, population per health facility and the number of children under 1 year of age per capita. Care was also taken to avoid districts where programmes were underway to improve maternal and child health, which could confound results.

Process evaluation data about programme design, implementation of the programme and change mechanisms were collected over three rounds in the short term (December 2011–March 2013) and one round in February 2015 to examine longer-term changes.

A theory of change guided the evaluation and was developed with reference to existing literature and based on discussion with national stakeholders. It is described in the study protocol,31 but a summary follows. P4P is expected to improve the quality of care of targeted services through an increase in health worker and manager motivation to obtain bonus payments, which is assumed to increase service coverage. If motivated to achieve targets, health workers might make services more accessible by reducing waiting time, ensuring drugs are available at the facility, following clinical guidelines that may lengthen consultations, reducing user charges and being more friendly and attentive to patients, resulting in greater patient satisfaction. Unintended consequences that could result from the P4P scheme include reductions in the use and quality of unincentivised health services. Furthermore, the quality of targeted services may decline over time, if health workers become overburdened and utilisation increases beyond available facility capacity.

### Data sources

#### Quantitative

We sampled 75 facilities from Pwani region and the same number from comparison districts, including hospitals (n=6), health centres (n=16) and dispensaries (n=53) in each arm. Comparison facilities had similar levels of outpatient care visits and staffing levels to intervention facilities. Facilities were sampled to achieve district representation, with 46% of all facilities in Pwani region being included in the sample. No sample size calculation was therefore carried out. We collected data through surveys of facilities, patient exit interviews and interviews at household level with women who had given birth in the past 12 months. The full sampling strategy is outlined in the study protocol31 but a summary follows, with more details in online supplemental appendix 1.

### Supplemental material

A total of 1500 women were sampled within the catchment areas of facilities in each arm and each round. The survey measured coverage of targeted maternal and child health services, satisfaction with delivery care, user costs for three of the targeted services and household socioeconomic characteristics.32 Seven hundred and fifty patient exit interviews were conducted in each arm per round with patients attending ANC or PNC, and women with children under 1 year of age coming for a preventive check-up or an immunisation. Sample sizes for the women and patient surveys are reported in online supplemental appendix 1. We collected data on process quality for incentivised (ANC and PNC, delivery and immunisation services) and non-incentivised services (outpatient visits for children under 5 presenting with fever, cough or diarrhoea). We measured provider adherence to clinical care guidelines for ANC (a 21-item index); waiting time (in minutes); kindness during delivery (using a 10-point scale) and patient satisfaction with provider–client interactions (an index of 13–19 items adapted from33). Facility surveys gathered data on monthly numbers of outpatient visits by age (under and over 5 years of age) from patient registers for the period January 2010 to December 2014.32 Facility surveys also gathered data on structural quality of care in terms of the availability (on the day of the survey) and stock out (in prior 90 days) of essential drugs (n=37), medical supplies (n=11) and equipment (n=16). We also looked at the availability/stock out of delivery care drugs (n=8), antimalarials (n=2) and antiretrovirals (n=7) as being related to incentivised services.32 For each of these groupings, we generated composite scores based on an unweighted mean score across items in the group, which can be interpreted as the mean percentage availability/stock‐out rate within the grouping across facilities.

#### Qualitative

The findings from the first three rounds of process evaluation data covering short-term implementation of P4P have been presented elsewhere.34 In this paper we focus on the findings from the most recent round of data collection (February and March 2015) which covers implementation in 2014. These findings were contrasted with the earlier process evaluation findings to identify implementation changes over time.

In this round, in-depth interviews were done in 24 facilities from two intervention districts (Bagamoyo and Kisarawe), including 19 dispensaries, 4 health centres and 1 hospital. Twenty-one facilities were public, the remainder were faith based/not for profit. Apart from the hospital, all the facilities were located in rural areas. In-depth interviews were done with the in-charge and/or health workers responsible for maternal and child health services and lasted about an hour. Interviews were also conducted with one or more district managers (Council Health Management Team) from four districts (Bagamoyo, Kibaha, Kisarawe and Mkuranga). The main purpose of the interviews was to understand health worker perceptions and response to the programme, including the use of bonus payments and strategies for achieving targets, and whether and how this changed over time. Sampled facilities differed in terms of remoteness, staffing numbers and characteristics. Towards the end of data collection, no new themes emerged. Two researchers (IM and SL) conducted all the interviews in Swahili. All interviews were recorded and later transcribed and translated into English.

### Data analysis

#### Quantitative

We used a linear difference-in-difference regression model with facility and year fixed effects to determine the effects of P4P over time and the difference between the short-term and the longer-term effects. To determine the short-term effects of the programme (2012–2013), we compared the changes in outcomes at 13 months compared with the baseline in P4P facilities to the change in facilities without P4P. To determine the longer-term effects of the programme (2012–2015), we compared the change in outcomes at 36 months to the baseline in P4P facilities to the change in facilities without P4P. We estimated separate effects for the short-term and long-term periods by including terms for the interaction between the intervention group and each of the two post-implementation periods (online supplemental appendix 2). We also estimated the difference between the short-term and long-term effects (online supplemental appendix 2). In the analysis of women’s and patients’ outcomes, we controlled for individual characteristics (education, religion, marital status, occupation, age, number of pregnancies) and household characteristics (insurance status, number of household members, household head education and wealth based on ownership of household assets and housing particulars). Standard errors were clustered at the facility level, or the facility catchment area.

We further estimated the heterogeneity of P4P effects across local area characteristics (wealth status, rural/urban location) and characteristics of facilities (level of care, ownership, baseline performance, above and below the median performance for deliveries and IPT during ANC)29 by including a three-way interaction term and controlling for time-varying facility-level covariates (availability of electricity and water supply, and the mean wealth index for households sampled in the catchment area of the facility) as potential confounding factors (online supplemental appendix 2).

The identifying assumption of the difference-in-difference approach is that the outcomes between study arms would have followed parallel trends in the absence of the intervention. We previously verified that trends in a number of outcomes at the household and facility levels were similar between the intervention and comparison areas prior to the introduction of P4P6 (online supplemental appendix 3). We also verified preintervention trends were parallel in facility service utilisation levels based on patient registers.6

The outcomes considered are those reported previously6: notably the eight incentivised indicators as well as indicators which could be indirectly affected by incentives (coverage of ANC and PNC) and non-targeted services (outpatient visits). We examined programme effects on quality of care measures, including effects on the availability and stock out of essential drug and supplies27 and on the probability of paying and costs of key maternal care services, and related gifts.

To take the multiple testing into account, we correct the p values by hypothesis using the Bonferroni correction (the p value threshold for statistical significance at the 5% level becomes equal to 0.05/(number of tests)). The grouping of the tests by hypothesis is listed in online supplemental appendix 4.

We present descriptive analyses of health worker and facility survey data in rounds 2 and 3, to determine implementation reach.

All statistical analyses were done with Stata (V.16).

#### Qualitative

The data were double coded using NVivo V.9 software, employing an inductive framework relating to the core research questions, comparing and contrasting perceptions and strategies employed early on and later in the programme, together with design adaptations and challenges experienced over time.

### Patient and public involvement

Patients were not directly involved in the design or dissemination of the study.

### Data

The quantitative data for this paper are made available through Zenodo, DOI: 10.5281/zenodo.5636645, https://zenodo.org/record/5636646%23.YanUmtnMK3I

## Results

### Impact evaluation findings

At baseline, coverage of institutional deliveries was over 84% (table 2). Two vaccination indicators (polio vaccine at birth and three doses of pentavalent vaccine) also had a baseline coverage of >75%. Baseline coverage of other incentivised services varied between 22% and 51%. Baseline coverage levels for incentivised indicators were generally similar between intervention and comparison groups (table 2). However, baseline coverage of IPT2 and four or more ANC visits was higher in comparison areas, 56.7% versus 49.5% (p=0.005) and 65.0% versus 71.2% (p=0.02), respectively. Baseline coverage of PNC within 7 days was higher in the intervention area, 21.5% versus 16.9% (p=0.043).

Table 2

Service use before and after the introduction of payment for performance

In the short term, P4P affected two out of eight indicators incentivised at the facility level; a 10.3 percentage point increase in the provision of IPT during ANC (p=0.001), and an 8.2 percentage point increase in the rate of institutional deliveries (p=0.001) (tables 2 and 3 and online supplemental appendix 5). These short-term effects are robust to correcting for multiple testing (at the 5% level of significance, the Bonferroni threshold for the p values is equal to 0.0055) (online supplemental appendix 4). In the longer run, there was a smaller effect on institutional deliveries (4.9 percentage points (p=0.018), but the decline was not statistically significant (3.2 percentage points p=0.114). The estimated effect on IPT coverage during ANC was also smaller in the longer term, and only borderline significant (5.6 percentage points (p=0.097)). While no short or long term effects were identified, there was an important reduction in measles immunisation coverage between the short and the longer term by 15.6 percentage points (p=0.013, not significant with the Bonferroni correction) =, and an increase in coverage of HIV treatment during ANC by 4.3 percentage points (p=0.085). There was no longer-term impact on any of the other incentivised indicators that did not change in the short run.

Table 3

Direct and indirect effect of payment for performance on the use of targeted services in the short and long term (results from the difference-in-difference analysis)

We also considered the effect of P4P on services which were indirectly incentivised. In the short term, we found that P4P was associated with a significant increase in coverage of at least one ANC visit by 3 percentage points (p<0.001), which was sustained in the longer term (tables 2 and 3 and online supplemental appendix 5). This effect is also robust to the Bonferroni correction (threshold=0.017), online supplemental appendix 4. We examined the effect of P4P on unincentivised care and found no significant effect on outpatient department visits (OPD) overall (table 4). Among dispensaries, there was a short-term reduction in OPD (by 91 visits and 58 visits per month for over 5 year olds and under 5 year olds, respectively), but no programme effect on these outcomes in the longer term.

Table 4

Effect of payment for performance on the use of non-targeted services in the short and long term (results from the difference-in-difference analysis)

We further examined programme effects on structural and process quality of care for targeted services (ANC, PNC and immunisations and delivery care) and non-targeted outpatient services and for delivery care (tables 5 and 6 and online supplemental appendix 5). There was a short-term positive effect on health worker kindness to women during delivery, which was sustained in the longer term. There was also evidence of an improvement in patient satisfaction with patient provider interactions during delivery care in the longer term (by 4 percentage points, p=0.035), whereas no short-term effect had been noted. We found no effect on patient satisfaction with antenatal, postnatal and immunisation services in the short or longer term. An improvement in satisfaction with interpersonal care among non-targeted service users was noted in the short term, but there was no effect in the longer term. While there was no short-term effect on waiting time, we found evidence of a reduction in waiting time due to the programme in the longer term for non-targeted services by around 18 min (p=0.038). Note that none of these effects are significant when correcting for multiple testing (Bonferroni threshold=0.0083).

Table 5

Quality of care before and after the introduction of payment for performance

Table 6

Effect of payment for performance on quality of care in the short and long term (results from the difference-in-difference analysis)

In terms of structural quality, there was evidence of significant improvements in the availability of drugs and medical supplies in the short term, as well as a reduction in their stock out rate. These positive effects reduced in the longer term; the programme effect on overall drug availability was no longer statistically significant, while the reduction in stock-outs was estimated at 9.6 points (p=0.004) in the long term compared with 13.6 points in the short term, with the longer-term effect being driven by a greater increase in stock outs in comparison areas (table 6). Most of the effects on the availability of drugs and medical supplies in the short and long term are also robust to the Bonferroni correction (threshold=0.017), online supplemental appendix 4.

We found evidence of a significant increase in public providers’ adherence to exemptions manifested by a reduced probability of paying out of pocket for deliveries by 5 percentage points (p=0.023) in the short term, increasing to 10 percentage points in the longer term (p<0.001)) (tables 7 and 8). Although the probability of paying for delivery care increased a little in the longer term compared with the short term in the intervention area, the probability of paying rose more substantially in comparison areas (table 7). This effect is robust to the Bonferroni correction (online supplemental appendix 4).

Table 7

Cost of services before and after the introduction of payment for performance

Table 8

Effect of payment for performance on the cost of services in public facilities in the short and long term (results from the difference-in-difference analysis)

#### Heterogeneity of effects

The programme effect on deliveries was significantly pro-poor in both the short and longer term (table 9). The effect was also greater among rural facilities in both the short and longer term. In the short term, the effect was greater among facilities with low baseline performance, but this was no longer the case in the longer term. There were no differential effects of the IPT coverage indicator by local area or facility characteristics (table 9).

Table 9

Heterogeneity effect of P4P

### Process evaluation findings

#### Programme awareness

During in-depth interviews both district level managers and health workers demonstrated a good understanding of the P4P design components such as objectives, indicators, target setting and bonus distribution formulas. This is in contrast to their more limited knowledge earlier on in the programme, reflecting learning over time. Health worker survey data confirmed increased awareness levels from 85% at 13 months of implementation to 100% at 36 months.

### Programme implementation

#### Bank accounts

When implementation started, a number of facilities had not opened bank accounts, including those in remote areas and faith-based facilities. The health facility survey estimated that 89% of facilities had opened bank accounts by 13 months of implementation, increasing to 96% by 36 months.

#### Bonus payments

Both health workers and managers said there were only small delays in the payments of the bonuses during the first five payment cycles (typically between 1–2 months delay). The payment for cycle 6, however, was 3 months late. As of February 2015, the payment for cycle 7, which was due in September 2014, still had not been made, and informants raised concerns about the delayed payments. The delay led to speculations among some health workers and managers that the scheme might have come to an end:

I thought that it [the scheme] had been stopped. Now I’m surprised they say that it’s still there. I really thought it wasn’t there anymore. (Health worker, Kisarawe district)

#### Data verification

The verification visits conducted by the national Pilot Management Team on a random sample of 25% of facilities once per cycle, ceased from cycle 7. However, district managers continued to conduct verification visits as part of their quarterly routine supportive supervision visits to facilities, a response to the shortage of P4P funds, which prevented managers from conducting separate verification visits as they had previously done. The process of verification which initially varied across districts, was harmonised in 2015, and involved comparing monthly routine health information system reports with patient registers. Health workers felt that P4P had a lasting effect on data compilation, completeness and accuracy. Three of the seven facilities in Kisarawe district had posters to remind health workers of the importance of data on the walls.

### Feedback meetings

Feedback workshops were supposed to be held once per cycle at the district level involving participants from all facilities, to allow reflection of lessons learnt regarding performance across facilities and experience sharing. From cycle 6 onwards the feedback meetings had ceased due to a lack of funds.

### Programme mechanisms

#### Drug procurement

In the first phase of the programme (up to cycle 6), the facility level bonus had been used to procure drugs and supplies with a focus on those drugs needed to deliver incentivised services. However, during the second phase, health workers indicated that delays in receiving funds made it difficult to continue to meet targets in some cases, due to the absence of funds. Antimalarial (sulfadoxine–pyrimethamine, SP), used for the IPT target, was the medication mentioned most often as having been affected by funding delays:

There was a time we ran completely out of SP. If those money [P4P bonus] had come at the correct time, we would have had money to buy SP for the pregnant women. (Health worker Kisarawe district)

The facility survey data showed that in the longer term the availability and stock out rate of IPT had returned to baseline levels.

### Health worker motivation

The delay of funds from cycle 6, and the perception that the P4P intervention had come to an end, affected health worker’s motivation, but not in a uniform manner. At facilities with a low number of staff, the bonus could amount to approximately 50% of a month’s net salary, while it was much lower at facilities with a large number of staff, as the bonus was shared across staff. Staff that received higher bonuses were more likely to voice discontent over the funding delays.

However, a number of respondents suggested that many of the behaviours linked to P4P had become normalised even with the absence of payment.

Before we took it as something monetary, but now we have become used to this as our daily work, we see it as something normal. (…) this is work we are doing out of conscience (…) now P4P is in our blood (…). (Health worker, Bagamoyo district)

There was generally still a sense of hope that even if the funds were delayed, the funding would be forthcoming, and the continued data verification activities by managers supported this:

…for us there is that saying “maybe I’ll get it tomorrow” so we are doing our work. (…). (Health worker, Bagamoyo district)

### Strategies to achieve performance targets

Health workers pointed to a number of ongoing strategies that were used to increase demand among households. Strategies included raising awareness about the dangers of home births and the lack of skills of traditional birth attendants (TBAs). Numerous strategies involving TBAs were mentioned by respondents, including giving TBAs 5000 Tanzanian shillings when they brought a woman to a facility for delivery, warning TBAs that they would be legally responsible if a woman ran into problems while under their care and fining TBAs who assisted in home-based deliveries (though this had not been implemented). However, in several cases, payments to TBAs had ceased with the delayed P4P payments.

## Discussion

This study contributes to the limited evidence examining P4P effects over time, while also trying to explore reasons for variation in effects. Our study found evidence of initial improvements in performance tied to incentivised indicators, coupled with reductions in unincentivised service use in dispensaries. However, our findings generally point to an attenuation of programme effects over time for those indicators that improved in the short term, some improvements in quality of care indicators that did not improve in the short term and the disappearance of negative spill-over effects on unincentivised services. Studies from other LMICs have reported similar short-term increases in targeted outcomes, with sustained effects over time in Mozambique,15 and stagnating longer-term effects in Burundi.35

The effects of the programme on the rate of institutional deliveries reduced overtime. Although coverage of ANC was maintained, performance on the IPT during ANC target was not sustained over time with stock out rates of IPT increasing over time to baseline levels after an initial reduction. The lack of sustained effect on this indicator is unlikely due to the incentive design (single threshold target), as coverage levels are still below the 80% threshold, but rather due to the funding delays in the longer term due to changes in programme management and funding. Research in Cameroon also reported reduced investment in drugs over time due to delays in incentive payments.36

We found that improvements in delivery care utilisation was higher among facilities with lower baseline performance in the short term, however, this differential effect was no longer apparent in the longer term. This is in contrast to US studies which found that facilities with lower baseline performance were more likely to have sustained effects over time.23

We found evidence of sustained improvements in kindness during deliveries, and indications of a wider set of improvements in patient experience of delivery care in the longer term. Qualitative data suggests it took time for health workers to fully understand and grasp the programme, which may explain why some of these changes were only observed in the longer term. Research elsewhere also reported that it took time for staff to understand the programme.37 The programme effects on process quality are a noteworthy positive spill-over effect, as quality of care indicators were not directly incentivised by the P4P programme in Pwani, unlike many other P4P schemes in sub-Saharan Africa.38

Our research suggests that the degree of integration of the P4P scheme within routine systems evolved over time. This was partly tied to adaptations in response to the delayed payment of incentives. For example, managers integrated verification visits within their routine supportive supervision visits, reducing the time impact of the programme on managers and health workers. The lack of longer-term effect on utilisation of non-incentivised services, suggests dispensary staff became more efficient in managing the additional data and reporting requirements over time.

The qualitative data suggests that the introduction of P4P increased extrinsic motivation in the short term, but this happened alongside increased financial management autonomy, and greater relatedness (interactions with managers), with no evidence of harm to intrinsic motivation. Similarly, in Zambia, health workers reported greater job satisfaction linked to enhanced supervision and financial autonomy.39 The ongoing benefits of financial autonomy linked to the programme and enhanced supervision, together with hope that funds would eventually arrive, likely sustained motivational effects, despite funding delays. Similarly, in Malawi the goal focus of the programme was motivating in itself, independently of incentives.40 However, reductions in performance and motivation linked to uncertainty in obtaining the incentives were reported in Nigeria and Sierra Leone.41 42

Our study has a number of limitations. It was not possible to randomly allocate the P4P scheme, and hence we used difference-in-difference methods which relies on assumption that trends in outcomes in intervention and comparison areas would run parallel if the programme had not been implemented. We were, however, able to verify that preintervention trends were similar for a number of outcomes. Second, the measures of non-targeted service use relied on patient register data which were incomplete for some facilities, limiting the available sample for analysis. Third, our assessment of motivational effects are based uniquely on qualitative findings.

Our findings highlight the importance of not just evaluating the effects of P4P at one point in time, but in undertaking ongoing assessment of effects over time. It is clearly important for evaluators to monitor changes in programme design and implementation and how this is related to outcomes, especially as schemes go from pilot to scale, and are taken over by government. This point is true of any intervention that aims to change the way health systems work and health workers behave, and where outcomes are likely to be non-stationary over time. More generally the results demonstrate the limitations of conventional evaluations of cause and effect, and the need to embrace a complex adaptive systems approach to understanding health systems and their response to P4P.43 Further research should apply complexity science methods such as system dynamics and agent-based modelling44 to increase our understanding of the dynamic, temporal effects of P4P,45 and the factors shaping this, so we can build programmes that have sustained effects in the long term.

## Data availability statement

Data are available in a public, open access repository. The quantitative data for this paper are made available through Zenodo, DOI: 10.5281/zenodo.5636645, url: https://zenodo.org/record/5636646%23.YanUmtnMK3I.

## Ethics statements

### Ethics approval

The study received ethical approval from the Ifakara Health Institute institutional review board (IHI/IRB/No: 1BI1IRB/38 – 2011), the National Institute for Medical Research (NIMR/HQ/R.8a/Vol.IX/2256), the London School of Hygiene & Tropical Medicine (6435) and the Norwegian Centre for Research Data. Participants gave informed consent to participate in the study before taking part.

## Acknowledgments

We acknowledge the Tanzanian Ministry of Health, Community Development, Gender, Elderly and Children, the Pwani Regional Health Management Team and the Council Health Management teams for supporting the study. We also acknowledge the time of the health workers, patients and women participating in the survey, and the field workers and supervisors who collected the data.

• ## Supplementary Data

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

## Footnotes

• Handling editor Valery Ridde