Which doctor? Combining vignettes and item response to measure clinical competence

https://doi.org/10.1016/j.jdeveco.2004.11.004Get rights and content

Abstract

We develop a method in which vignettes–a battery of questions for hypothetical cases–are evaluated with item response theory to measure the clinical competence of doctors. The method, which allows us to simultaneously estimate competence and validate the test instrument, is applied to a sample of medical practitioners in Delhi, India. The method gives plausible results, rationalizes different perceptions of competence in the public and private sectors and pinpoints several serious problems with health care delivery in urban India. The findings confirm, for instance, that the competence of private providers located in poorer areas of the city is significantly lower than those in richer neighborhoods. Surprisingly, similar results hold for providers in the public sector with important implications for inequalities in the availability of health care.

Introduction

The “quality” of medical care is an important determinant of both the demand for health services and health outcomes. How to measure quality, though, is more problematic. Quality has been alternatively defined by physical infrastructure, the stock of medical supplies, the total number of assigned personnel, the availability of refrigeration units, the availability of electricity or a combination of some of these (Collier et al., 2003, Lavy and Germain, 1994). A remarkable but understandable omission from these indicators is the quality of medical personnel, particularly since they account for the largest component of cost and arguably make the greatest contribution to health outcomes in these facilities. This paper examines one particular dimension of provider quality: medical knowledge.

The omission of provider quality is remarkable since, as an explanation of demand, it is the nature of the advice given that is actually specific to the facility. The presence or absence of drugs, for example, indicates the degree of subsidy that a service represents (if drugs are distributed free of charge in public facilities). This measure is not informative about the inherent quality of advice or likely outcome of visiting the facility as opposed to self-medicating. Further, public facilities with high demand due to high quality are more likely to face “stock outs” if drugs are free, resulting in potential misclassifications. From the point of view of household optimization, the response to improvements in facility characteristics that can be purchased in a market (medicines are a prime example) will be less than improvements in non-tradable characteristics such as provider quality (on this, see Foster, 1995; for an application to schooling, Das et al., 2004).

These problems are exacerbated in environments such as urban India, the focus of this study, where (a) the private sector is the primary provider of care and there is little variation in out-patient infrastructure, (b) health insurance is virtually non-existent so that most spending is out-of-pocket and (c) the de jure regulation and certification of providers does not translate into de facto enforcement so that households are faced with a bewildering variety of providers characterized by very different competence levels (Mahal et al., 2004, Kakar , 1988, Rohde and Viswanathan, 1995, Jesani, 1997).

Moreover, recurrent expenditure on medical supplies and infrastructure is small compared to that on personnel. India spent over 60% of its recurrent health budget (which accounts for 97% of all expenditure) on salaries in 1990 (Reddy and Selvraju, in press). Given the predominance of salaries in the health budget, a critical policy question is whether expenditures are efficiently allocated. Are public facilities providing a service that cannot be obtained in the (usually very large) private market?

The omission is also understandable. Apart from logistic problems and the high human capital requirements of measuring provider quality, there is little consensus on how it should be measured. This is particularly true of low-income countries characterized by multiple medical systems and degrees. While some progress has been made (Peabody et al., 2000, Leonard and Masatu, in press) there are a number of problems that remain unresolved. One fundamental issue is the construction of a metric that can be used to gauge the validity of the measurement tool—how does measured quality compare to the “true” quality of the medical advice received from the provider? The problem is particularly severe when providers respond optimally to incentives so that practice is a poor reflection of knowledge. In this case, using observed practice to validate measures of knowledge (as in Peabody et al., 2000) is invalid.

This paper addresses the problem of measuring the “knowledge” or “competence” dimension of provider quality. There are, of course, other dimensions of quality including whether they show up for work and the conscientiousness they exhibit while on the job (Chaudhury and Hammer, 2004, Banerjee et al., 2004). While these other aspects of quality depend on the incentives that providers face, the frontier for what providers can and cannot do is clearly linked to their medical knowledge. This paper develops a methodological tool for measuring provider competence, assesses this frontier using data from urban India and studies its distribution across neighborhoods and sectors (public and private). We find this frontier to be a surprisingly constraining factor—even if providers behaved properly the quality of care received, particularly for the poor, would still be very low.

We measure competence through the use of clinical vignettes in combination with Item Response Theory (IRT) methods (Hambleton et al., 1991). The combination of vignettes along with item response provides two benefits. First, clinical vignettes, by standardizing the cases used to judge quality, allow us to abstract from the provider's case-load mix that may reflect unobserved selection criteria. Second, item response theory uses the test data to generate a measure of the “usefulness” of the test (in a sense, made more precise below) for measuring competence, eliminating the need for a separate, independent metric of comparison. Statistically, this technique is related to principal components or factor analysis in that it extracts a measure of a latent variable–a student's general grasp of knowledge in school examinations, knowledge of medical procedures in our analysis–from a set of response vectors.

These techniques are used with data collected from 205 public and private providers in 7 localities of Delhi by the authors over a 2-year period. The choice of these particular providers and neighborhoods is discussed below; in essence, the sample is tied to a pre-existing household survey and represents the available universe of providers for households in the parallel study. The providers cover a very broad range of skills and qualifications and the particularities of an urban environment allow us to investigate a number of questions regarding the relationship between competence, market organization and provider choice in a broader research project.

This paper focuses on the measurement problem that is a necessary first stage for examining the question of provider choice and organization of the market for health care. The results are encouraging. The study finds that despite tremendous variation, the use of vignettes allows for a reasonably precise measurement of competence. The method performs well in distinguishing between providers with high and low competence although it is better at distinguishing among relatively competent providers than distinguishing among those in the lower end of the competence distribution.

Based on this competence index, the paper presents some results on the treatment patterns of providers and the structure of the health care market in Delhi. Some surprising findings emerge. With regard to the treatment patterns, the study shows that a provider classified as highly competent by the index satisfies only a very weak condition—the ability to distinguish life-threatening situations and act accordingly, either through treatment at the clinic itself or through referrals. A significant proportion is unable to diagnose such conditions and the competence index effectively captures the provider's ability to recognize such conditions. In this sample, 28% were unable to diagnose a textbook case of uncomplicated pulmonary tuberculosis and 44% were unable to diagnose and refer a standard case of pre-eclampsia (a complication of pregnancy that both requires immediate care and is responsible for a substantial fraction of maternal deaths).

What correlates with competence? Households in poor neighborhoods are worse off in a number of ways. On the one hand, more competent providers are predominantly in richer areas, and on the other, even within the public sector less competent providers are in poorer localities. Apart from the neighborhood, medical training accounts for the most significant proportion of the variation in competence with the expected sign. Finally, on-the-job experience (or tenure in the particular neighborhood) has little impact. If anything, it leads to a slight decline, perhaps because any learning-by-doing is more than balanced by improvements in the training received by recent graduates (vintage effects). The distribution of competence across poor and rich neighborhoods is the same for young and old providers, consistent with the hypothesis that observed differences arise due to sorting rather than differential depreciation in competence.

The remainder of our paper is structured as follows. Section 2 highlights some problems in the literature concerning the measurement of competence. Section 3 presents the vignettes methodology and the sampling strategy as well as a basic summary of the data. Section 4 outlines the statistical theory on which the analysis is based as well as the econometric issues that arise in the use of the competence variable. Section 5 presents results and Section 6 concludes with suggestions for improvements in the instrument for future studies of this kind.

Section snippets

Measures of quality

The measurement of quality of care has never been an easy task. Such measures are proposed for very different purposes and have, inter alia, measured knowledge of medical practitioners, their behavior in clinical settings and the outcomes of medical treatments. Confusing the three, which are all determined by and should be measured by different means, has been a serious problem. While the ultimate outcome of interest is the actual health improvement of a patient, it is important to distinguish

Vignettes construction and sample

Vignettes were designed in consultation with doctors for five diseases that are common in Delhi. These were a child with diarrhea, a man with viral pharyngitis, a man with tuberculosis, a young girl with depression and a pregnant mother with pre-eclampsia. A vignette proceeded as follows: The interview team consisted of two individuals. One played the role of the “patient” (or for the child with diarrhea, the mother) and the other was there to record the interaction and to provide additional

Item response overview12

Item response theory (IRT) was first (and continues to be) used in the field of psychometrics (Rasch, 1960) to understand the relationship between some condition of a patient (for instance depression) and her response to a set of questions. The theory is based on the assumption that there is an underlying latent random variable, θ, and every “question” in a test maps this latent variable to a response. In the context of this paper, the latent variable θ is interpreted as provider competence and

Results

The construction of the competence index uses the history, examination and treatment sections of the vignettes (Appendix A, Table A lists every item used). For the treatment questions, a single variable is used for every case that summarizes the quality of treatment. This variable is based on ratings on a scale of −3 to 3 by three sets of independent doctors, two from South Asia in similar epidemiological environments and one a team from The Johns Hopkins University School of Medicine. For the

Conclusions, caveats and extensions

This paper developed a method for measuring clinical competence of medical care providers using Item Response Theory. The method was applied it to data collected by the authors on a sample of providers for medical services in Delhi and used to compare rich and poor areas as well as the public and private sectors. The method seems to us (at least) to have promise for future studies of this kind and has notable advantages over traditional treatment of test results such as standardized raw scores.

Acknowledgements

The modules used in this study were designed in consultation with Dr. Tejvir Singh Khurana and many discussions with Ken Leonard and AsimKhwaja. The pilot and survey was implemented by Jishnu Das and Jeffrey Hammer with N. Deepak, Pritha Dasgupta, Sourabh Priyadarshi, Poonam Kumari and Sarasij Majumdar, all members of the Institute of Socio-Economic Research on Development and Democracy Delhi (ISERDD). Further support from Purshottam, Rajan Singh, Ranjit Gautam and Simi Bajaj, often under

References (30)

  • Abhijit Banerjee et al.

    Wealth, health and health services in Rural Rajasthan

    American Economic Review, Papers and Proceedings

    (2004)
  • Allan Birnbaum

    Some latent trait models and their use in inferring an examinee's ability

  • Richard D. Bock et al.

    Marginal maximum likelihood estimation of item parameters: an application of an EM algorithm

    Psychometrika

    (1981)
  • Richard D. Bock et al.

    Fitting a response model for n dichotomously score items

    Psychometrika

    (1970)
  • Chaudhury, N., Hammer, J., 2004. Ghost doctors: absenteeism in Bangladeshi health facilities. Policy Research Working...
  • Paul Collier et al.

    Density versus quality in health care provision: using household data to make budgetary choices in Ethiopia

    The World Bank Economic Review

    (2003)
  • Das, Veena, Das, Ranendra K., in press. Pharmaceuticals in urban ecologies: The register of the...
  • Das, Jishnu, Hammer, Jeffrey, in preparation. Money for nothing: the dire straits of medical practice in...
  • Das, Jishnu, Sánchez-Páramo, Carolina, 2003. Short but not sweet: new evidence on short duration morbidities from...
  • Das, Jishnu, Habyarimana, James, Dercon, Stefan, Krishnan, Pramila, 2004. When can school inputs improve test scores....
  • F. Drasgow et al.

    Modified parallel analysis: a procedure for examining the latent dimensionality of dichotomously scored item responses

    Journal of Applied Psychology

    (1983)
  • Andrew D. Foster

    Prices, credit markets and child growth in low-income rural areas

    The Economic Journal

    (1995)
  • Ronald K. Hambleton et al.

    Item response theory: principles and applications

    (1985)
  • Ronald K. Hambleton et al.

    Fundamentals of item response theory

    (1991)
  • J.A. Hattie

    Methodological review: assessing unidimensionality of tests and items

    Applied Psychological Measurement

    (1985)
  • Cited by (115)

    • Two Indias: The structure of primary health care markets in rural Indian villages with implications for policy

      2022, Social Science and Medicine
      Citation Excerpt :

      Finally, medical knowledge was measured using clinical vignettes with each provider completing three of four conditions – tuberculosis in a young adult male, preeclampsia in a pregnant mother and either diarrhea or dysentery in a young child (half the providers received the dysentery case and the other half diarrhea). These cases had been developed and tailored to the Indian context previously, with agreed-upon definitions of what constitutes a necessary checklist of history questions and examinations and correct case management for each case (Das and Hammer, 2005; Das et al., 2015). Following Das and Hammer (2005), we base our measure of knowledge on each provider's adherence to the case-specific checklist of necessary history questions and examinations.

    View all citing articles on Scopus
    View full text