BACKGROUND

In 2002 the Institute of Medicine released Unequal Treatment 1, a seminal report documenting extensive evidence of disparities in the burden of disease, quality and appropriateness of care, and health outcomes among specific US populations, in particular ethnic minorities. Multiple, interdependent factors have been shown to contribute to health disparities, including patient, clinician, health system and environmental variables2, 3. In response to calls to address diverse health care needs of the US population, curricular tools have been developed with the intention of improving clinician and patient communication and behaviors to reduce these disparities47. There is an implicit understanding that providing culturally effective care will lead to improved quality of care1,811. But there remains a need for evidence that links carefully developed curricula with patient-centered and clinical outcomes6,1012.

In a systematic review examining the effectiveness of cultural competence (CC) curricula13, Beach and colleagues found 52 studies addressing impact on provider competencies but only 3 addressing patient outcomes; they concluded that evidence that CC training improves patient adherence and health care equity was lacking. CC training reviews have focused on the effect of training on learners’ acquisition of skills, knowledge and attitudes13, and the rigor of the methods and assessments of curricular dissemination and replication10,14,15. Two reviews addressed training effect on health care systems and mental health services16,17; both concluded that the evidence for effectiveness of training on service delivery and health status was limited. The recent randomized controlled trial by Sequist and colleagues18 reporting that CC training and performance feedback did not improve documented disparities in diabetes care outcomes between black and white patients has prompted a reexamination of the impact of CC curricula on patient outcomes.

OBJECTIVES

Since the Beach 200513 review, a variety of valid measures to examine the quality of research studies have emerged1923. Our goal was to reevaluate and update the literature since the Beach review, assess the quality of studies and overall impact of training on patient outcomes, and propose a framework for future research. Our questions were: What is the evidence for a direct link between provider CC training and patient outcomes (number and type of studies)? Are existing studies well designed and adequately powered to examine patient outcomes (quality of studies)? Is there robust evidence for a lack of association between provider training and patient outcomes?

Based on our review and previously described theoretical models, we propose an algorithm for conducting studies on educational interventions that specifically examine patient outcomes as primary endpoints.

METHODS

Data Sources and Searches

We conducted a systematic literature review to assess studies with any CC intervention for health care providers or learners where impact on patients and/or health care utilization was measured. We used formal methods for literature search, selection, quality assessment and synthesis, and followed accepted guidelines24,25. Between February and March 2010, we conducted an electronic search of the MEDLINE/PubMed, PsycINFO, Education Resources Information Center (ERIC), Cumulative Index of Nursing and Allied Health Literature (CINAHL) and Web of Science databases for articles in English published between January 1990 and March 2010. Using MEDLINE/PubMed, we developed an initial search template (below) and applied it to the databases to maximize sensitivityFootnote 1:

(cultural competence OR cultural competency OR cultural diversity OR cultural diversities OR health disparities OR health disparity) AND (training OR curriculum OR teaching) AND (patient outcomes OR outcome assessment OR health care quality assurance) AND (professional patient relations OR patient compliance OR patient adherence OR patient satisfaction OR patient cooperation).

In addition we searched the Cochrane database of systematic reviews26, the BEME resource for evidence-based education studies and systematic reviews27, and an educational research clearinghouse28. We also searched the bibliographies of key review articles1217, and contacted authors and queried experts in the field for additional studies that may have been missed.

Study Selection and Data Extraction

Articles were selected from abstract lists generated by the electronic and hand searches, based on pre-specified inclusion and exclusion criteria (Fig. 1). Eligible studies had to (1) represent original studies with learners/providers and patients; (2) include provider/learner cultural competency education/training; and (3) measure specified patient-centered outcomes (such as satisfaction) or disease outcomes (such as blood pressure), and/or health care utilization or processes of care. Studies with multiple interventions such as provider and patient education, or that had systems interventions (such as telephone reminders) were eligible. We excluded articles that were not in English; did not have original provider/learner/patient data; contained curricula with patient but not provider education; or that described only generic communication skills curricula. Reviews, editorials and unpublished abstracts and conference proceedings were excluded.

Figure 1
figure 1

Summary of literature search and selection. *Articles that met criteria from database searches for abstract review. Includes bibliography search and author contacts. Excluded due to no curricular intervention or no outcomes measured.

Data Synthesis and Analysis

Three authors (DAL, ELR and AG) reviewed all abstracts from the database searches and retrieved full-text articles for further review. Each abstract list was independently reviewed by two authors. Then four authors (DAL, ELR, AG, SB) read the retrieved articles for final article selection and quality assessment. The bibliographies of the retrieved full-text articles were hand-searched. Finally we contacted authors for additional information if indicated. We divided the articles so that two reviewers were assigned to each full-text article for independent quality assessment. To rate the studies for quality, we considered several criteria1923. We chose and adapted the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology: Explanation and Elaboration) criteria for convenience and scope of scoring19,29,30. Items 4 to 21 from the STROBE checklist30 included assessment of study design, setting, participants, confounding variables, bias, study size, statistical analysis, outcome measures, results, limitations and study generalizability as key constructs. Using scores of 0 (not done), 1 (done partially) and 2 (done well) with scores doubled for statistical methods and outcomes, our scheme (see Appendix) produced a score range of 0 (lowest) to 40 (highest quality). We also used the MERSQI (Medical Education Research Study Quality Instrument), a validated ten-item tool designed for medical educational interventions with a score range of 5 (lowest) to 18 (highest) as a second measure20. The MERSQI utilized similar constructs of study design, sampling, validity, data analysis and outcomes to assess research quality. Data abstraction was standardized and rating reliability optimized by discussion to achieve rater agreement for the individual scale items, before independent rating. We designated studies as being of low, moderate and high quality using tertiles of scores for the STROBE and MERSQI. If the two primary reviewers of an article independently placed the article into different categories (tertiles) for quality, the remaining two (secondary) reviewers also independently scored the article. Consensus was then derived by a joint decision of all four reviewers.

Each primary reviewer pair assessed the effect size of curricular interventions on patient-centered outcomes, as – (negative/causing harm), 0 (no effect), + (small), ++ (moderate) or +++ (high benefit) by interpreting the magnitude of meaningful clinical and/or patient-reported benefit or harm as reported in each study, by discussion and consensus, with adjudication and input from the other two reviewers as needed.

RESULTS

Search Results and Data Abstraction

The electronic search yielded 251 abstracts from the MEDLINE/PubMed, 96 from the PsycINFO, 275 from the CINAHL, 24 from the ERIC and 98 from the Web of Science databases (Fig. 1). A total of 15 abstracts were selected. All reviewers agreed on the abstraction of articles for full review. Six abstracts from MEDLINE/PubMed, four from PsycINFO, four from CINAHL, none from ERIC and one from Web of Science were identified for full-text review. Three of the abstracts were duplicated in 4 of the 5 databases, with the result that 12 articles were retrieved for full-text review. Subsequent bibliographic review of these and the previously conducted reviews10,1217 yielded another 5 articles for a total of 17 articles representing 17 different studies. Of the 17 articles that underwent full-text review, 10 were excluded from further quality assessment because of (1) having no curricular intervention (n = 5), (2) having no patient or health care utilization outcomes (n = 4), and (3) being interim reports with results pending (n = 2), leaving 7 studies in the final quality analysis16,2934. Among the seven studies, quality rating discrepancies occurred in two studies using STROBE and two using MERSQI, with final rating achieved by adjudication involving secondary reviewers.

Qualitative Synthesis of Selected Studies

Clinical or patient-based endpoints were at least one of the outcomes of interest in all seven studies. Three studies involved physicians, two involved multiple health professionals and students (nurses, home health care, community health workers and ‘allied health’) and two involved mental health professionals. Two were quasi-randomized, two were cluster randomized, and three were pre/post field studies. The number of learners/providers in each study ranged from 8 to over 3,700 and number of patients from 37 to 7,557. The curricular interventions were varied in content, theory and method (not described in one study). Examples of content include the Pederson’s triad of cross-cultural counseling and a language/culture immersion course. Duration of curricular exposure ranged from 4 h to 10 weeks (Table 1). No study examined the dose-response association between training and patient outcomes, or differential effects of training among different health professionals on patient outcomes. The patient outcomes assessed included patient or family satisfaction, patient self-efficacy, clinical outcomes (blood pressure, weight change and HbA1c) and patient assessment of provider cultural competency (Table 2). Mean study quality scores ranged from 8 to 26 for the STROBE and 5.5 to 12.0 for the MERSQI. Quality of studies was rated as moderate (n = 4) to low (n = 3), with none of high quality. The two tools categorized all seven studies in similar tertiles for quality. Effect size ranged from 0 to ++ (unable to assess in two studies) with no study that reported a harmful (-) or highly beneficial (+++) effect (Table 2). The three studies reporting a positive effect were free of “spin” in their interpretation of positive findings31. Study variability for providers/learners, patients, methods and outcomes prevented aggregate quantitative assessment and the use of sensitivity analyses to test for heterogeneity24,32, and pooling of data for effect size. Hence, a qualitative synthesis and analysis are presented.

Table 1 Learner/Provider Characteristics and Cultural Competence Curricula
Table 2 Patient Characteristics and Impact of Provider Training on Patient-based Outcomes

Wade 199133

This (quasi)-randomized controlled trial measured the effect of 4 h of cultural sensitivity training on a convenience sample of eight Master's level psychology female counselors and their black female patients in a college counseling center. Patients reported a significant positive effect on counseling skills and cultural sensitivity for the intervention compared with control counselors. Small provider sample, self-selection, patient attrition and unique setting limited interpretation, generalizability of the findings and study quality (rating: low).

Mazor 200234

This pre/-post field study examined a Spanish language proficiency class for faculty non-fluent in Spanish. Satisfaction ratings of 143 families, administered in Spanish, showed significant improvement for ‘physician concern’ (OR 2.1), ‘feeling comfortable with physician’ (OR 2.6), ‘physician being respectful’ (OR 3.0) and ‘physician listened to family’ (OR 2.9). High response rate, valid measures and curricular specificity contributed to the study’s quality (rating: moderate).

Way 200235

This pre/post analysis of inpatient perceptions of mental health staff and the environment in psychiatric units was conducted before and after a mandated statewide core curriculum consisting of six modules that included CC in one module. The curriculum was delivered to over 3,700 hospital staff and providers at 20 hospitals over 2 years. At three hospitals, 77 patients perceived greater ‘environmental changes favoring their interests’ and ‘receptiveness toward the staff.’ Effect of the CC training could not be assessed because of inconsistent data and small patient numbers in proportion to providers trained, both factors contributing to low study quality (rating: low).

Majumdar 200436

This (quasi)-randomized study examined 36 h of cultural sensitivity training vs. no training on providers and their patients. Longitudinal use of six validated scales assessing patient satisfaction, resourcefulness, access to services, mental and physical health at 3 to 18-months contributed to study quality. A positive effect was seen for patient use of social and economic resources. However, high attrition among patients (from 133 at baseline to 37) and providers (from 114 to 76) limited validity (rating: moderate).

Thom 200637

This cluster randomized trial randomized primary care physicians from two sites (n = 23) to a half-day or a three-session curriculum with feedback vs. physicians from two sites (n = 30) to feedback only. Patient reports of their physician’s cultural competency comprised the feedback and the primary outcome. Secondary outcomes were patient satisfaction, patient trust and clinical outcomes of blood pressure, HbA1c and weight loss at 6 months. Two hundred forty-seven patients rated the ‘training + feedback’ and 182 rated the ‘feedback only’ group. No significant differences were found. Study quality was enhanced by adequate patient sampling and use of valid outcome measures (rating: moderate).

McElmurry 200938

This 3-year pre/post multisite field study of 386 clinic-based providers and students included Spanish immersion or language classes and cultural workshops as interventions to increase competency to care for 1,994 Limited English Proficiency diabetic patients. A second intervention consisted of community health workers (CHW) added as a systems change component. Patient outcomes were assessed only for the CHW intervention. For the subset of patients with at least two visits with CHW (n = 392), the study demonstrated reduced patient no-show rates and improvements in diabetes self-monitoring, and HbA1c. However, the impact of the CC curriculum was assessed only by provider/student ratings and language skills. Therefore, the effect on patient outcomes could not be independently assessed, limiting study quality (rating: low).

Sequist 201018

This 12-month cluster randomized, controlled trial evaluated the effect of CC training and clinical performance feedback for 15 (education plus feedback) compared with 16 (feedback only) primary care teams. The 31 teams consisted of 91 physicians and 33 nurse practitioners or physician assistants at eight ambulatory centers. Race-stratified physician-level diabetes performance reports with recommendations were provided at 4 and 9 months. Eighty-two percent of clinicians were white. No significant differences in the primary outcome of disparity reduction for black vs. white patients were found. Study quality was enhanced by a low provider dropout rate (15%), high patient numbers (7,557) and use of valid measures, but subgroup analyses were missing (rating: moderate).

DISCUSSION

We believe this is the first systematic review to critically assess the quality of studies that determine whether educational interventions to improve the cultural competence of health professionals are associated with improved patient outcomes. This paper updates the 2005 findings of Beach et. al.13 in examining new curricular offerings (adding four new studies) and provides an analysis of research quality. As is the case with many educational studies20,24, researchers faced threats to external validity. Importantly, the majority of the studies did not provide sufficient information on the curricula, provider/learners or patients to allow replication. Many studies lacked descriptions of potential variables that may have impacted results. Related to the provider, these include their prior training, age, ethnicity, gender, baseline attitudes and skills, and motivation to participate in training. Patient factors were not adequately accounted for a priori. Provider and patient race and language concordance and their potential effects were not consistently reported. Generalizability of findings was limited as study communities, and settings were often unique. Moreover, some studies had multiple objectives with cross-contamination affecting patient outcomes, making it difficult to isolate the effect of provider training from system changes. The studies, albeit of limited quality, reveal a trend in the direction of a positive impact on patient outcomes. However, overall the current evidence appears to be neither robust nor consistent enough to derive clear guidelines for CC training to generate the greatest patient impact. It is also possible that CC training as a standalone strategy is inadequate to improve patient outcomes and that concurrent systemic and systems changes, such as those directed at reducing errors or improving practice efficiency, and the inclusion of interpreters and community health promoters as part of the health care team, are needed to optimize its impact.

Our review used a comprehensive search strategy and a systematic process to assess study quality and identify potential reasons for inconsistent results. However, we were challenged to find an ideally designed tool for quality assessment. By using both the STROBE criteria (29-30, Appendix) and the MERSQI20, we strove to maximize the validity of the quality review.

Synthesizing existing conceptual models of cultural competence with an established framework for evaluating methodological rigor in education research, we propose an algorithm (Fig. 2) as a guide to achieving excellence in methodological design. This model addresses both experimental (randomized, cluster or quasi-randomized) and field (pre/post case control or cohort, or cross-sectional) designs. The algorithm is based first on the theoretical framework described by Cooper and colleagues39 in which the quality of providers including their cultural competence is one of four mediators (the other three being appropriateness of care, efficacy of treatment and patient adherence) of high quality patient outcomes (categorized as health status, service equity and patient views of care). They stated that ‘… important limitations of previous studies include the lack of control groups, nonrandom assignment of subjects to experimental interventions, and use of health outcome measures that are not validated. Interventions might be improved by targeting high-risk populations, focusing on quality of care and health outcomes39. Second, we built on the model of methodological excellence advocated by Reed et al.20, who noted that existing educational studies of the highest quality used randomized controlled designs, had high response rates and utilized objective data, valid instruments and statistical methods that included appropriate subgroup analyses, with accountability for confounding variables.

Figure 2
figure 2

Suggested algorithm for educational studies on patient outcomes.

We recommend that educators consider Figure 2 a realistic guiding roadmap. When designing the conceptual framework of proposed studies, we advocate that researchers consider the strength of the existing evidence linking cause and effect40 to perform sample size calculations, as well as the reproducibility and generalizability of the results (internal and external validity, respectively). We propose that the description of providers/learners include information on past training, demographics, cultural and linguistic background, baseline skills and attitudes and the health system (context) within which they function. Patients should be characterized by medical condition, demographics, health literacy, language proficiency, health beliefs, socioeconomic background and other potential confounders. The curriculum implemented should be sufficiently well described for replication, to include core resources and teachers. The cost of training should be made explicit. Study designs should consider the type of subsequent analyses testing the relationship between the intervention and patient outcomes29,39. Provider educational interventions are often distant from clinical outcomes, and subjective constructs such as patient trust and the quality of the patient experience using validated measures41 have emerged as outcomes of intrinsic value that should also be considered in the cause-effect dynamic. As well as traditional objective clinical indicators, outcomes should include process measures of the patient-physician partnership42,43, which may be considered intermediate or standalone goals in the attainment of best health care quality. All reasonable confounders should be captured to rule out alternative hypotheses and increase confidence in the results. Heterogeneity of providers and patients should be accounted for by subgroup analyses when reporting results as CC training may have differential effects on different patients by ethnicity or disease. The durability of training on patient outcomes should be tested. Where system interventions other than provider training are concurrently introduced, three separate study arms may be needed to isolate the effect of training from the system change. As our systematic review revealed, these standards have not been adequately met. However, two registered trials currently underway (results pending) appear to meet many of the suggested criteria for experimental design, and results are eagerly awaited44,45, personal communication with first author).

In an era of systems interventions, using models such as the patient-centered medical home46,47, interprofessional education48 and teamwork training49 to achieve high quality care, a purely disease-oriented approach with attention only to clinical interventions is no longer adequate. Educators can and should take up the challenge to isolate specific training strategies as cost-effective and sustainable interventions for improving health care quality, particularly for chronic diseases. However, the quality of educational research has been shown to be directly associated with study funding29, and we acknowledge that prohibitive cost, noted as a limiting factor in at least one study we assessed36, is one factor limiting implementation of rigorous studies. Some solutions to improve study quality include increasing power with multi-institutional studies similar to those utilized in multicenter clinical trials and the development of multi-institutional shared databases50. For curricula that apply to different health professions (such as cross-cultural communication skills), providers/learners could be combined and results analyzed by subgroups. This approach allows resource management to reduce cost. Curricular standardization and quality control can be better achieved when materials are developed and delivered by expert groups with rigorous peer assessment. Training materials should be based on transparent, evidence-based, reproducible and validated techniques that incorporate attention to baseline competencies. As materials are developed, universal affordable access would be helpful in advancing the field.

In conclusion, we assert that there is a critical need for increased resources to examine education as an independent intervention to improve health outcomes. The same level of planning, attention and scrutiny should be invested in comparative efficacy studies of educational interventions as for clinical and health services research. In light of our findings and proposed algorithm, a modified or new validated tool for evaluating the quality of such studies would be desirable.