Article Text

Development and validation of a new measurement instrument to assess internship experience of medical doctors in low-income and middle-income countries
  1. Yingxi Zhao1,
  2. Sulaiman Jalloh2,
  3. Phung Khanh Lam3,4,
  4. Yakubu Kevin Kwarshak5,
  5. Daniel Mbuthia6,
  6. Nadine Misago7,
  7. Mesulame Namedre8,
  8. Nguyễn Thị Bé Phương4,
  9. Sefanaia Qaloewa9,
  10. Richard Summers10,
  11. Kun Tang11,
  12. Raymond Tweheyo12,13,
  13. Bridget Wills1,3,
  14. Fang Zhang14,
  15. Catia Nicodemo15,16,
  16. David Gathara6,17,
  17. Mike English1,6
  1. 1NDM Centre for Global Health Research, Nuffield Department of Medicine, University of Oxford, Oxford, UK
  2. 2Ola During Children's Hospital, Freetown, Sierra Leone
  3. 3Oxford University Clinical Research Unit, Ho Chi Minh City, Viet Nam
  4. 4University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam
  5. 5Department of Surgery, Division of Urology, Jos University Teaching Hospital, Jos, Plateau State, Nigeria
  6. 6KEMRI-Wellcome Trust Research Programme, Nairobi, Kenya
  7. 7Interdisciplinary Research Group in Public Health / Doctoral School, University of Burundi, Bujumbura, Burundi
  8. 8Independent Researcher, Suva, Fiji
  9. 9College of Medicine, Nursing and Health Sciences, Fiji National University, Suva, Fiji
  10. 10School of Social Policy, University of Birmingham, Birmingham, UK
  11. 11Vanke School of Public Health, Tsinghua University, Beijing, People's Republic of China
  12. 12Department of Health Policy Planning and Management, Makerere University School of Public Health, Kampala, Uganda
  13. 13Centre for Health Systems Research and Development (CHSRD), The University of Free State, Bloemfontein, South Africa
  14. 14Department of Endocrinology and Metabolism, Peking University People’s Hospital, Beijing, People's Republic of China
  15. 15Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
  16. 16Department of Economics, Verona University, Verona, Italy
  17. 17MARCH Centre, London School of Hygiene and Tropical Medicine, London, UK
  1. Correspondence to Yingxi Zhao; yingxi.zhao{at}


Routine surveys are used to understand the training quality and experiences of junior doctors but there are lack of tools designed to evaluate the training experiences of interns in low-income and middle-income countries (LMICs) where working conditions and resource constraints are challenging. We describe our process developing and validating a ‘medical internship experience scale’ to address this gap, work involving nine LMICs that varied in geographical locations, income-level and internship training models. We used a scoping review of existing tools, content validity discussions with target populations and an expert panel, back-and-forth translations into four language versions and cognitive interviews to develop and test the tool. Using data collected from 1646 interns and junior medical doctors, we assessed factor structure and assessed its reliability and validity. Fifty items about experiences of medical internship were retained from an initial pool of 102 items. These 50 items represent 6 major factors (constructs): (1) clinical learning and supervision, (2) patient safety, (3) job satisfaction, (4) stress and burnout, (5) mental well-being, and (6) fairness and discrimination. We reflect on the process of multicountry scale development and highlight some considerations for others who may use our scale, using preliminary analyses of the 1646 responses to illustrate that the tool may produce useful data to identify priorities for action. We suggest this tool could enable LMICs to assess key metrics regarding intern straining and initial work experiences and possibly allow comparison across countries and over time, to inform better internship planning and management.

  • Medical education
  • internship experience
  • scale development
  • measurement
  • low- and middle-income countries

Data availability statement

Data are available upon reasonable request. The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Summary box

  • Internship experience can be challenging due to the rapid transition from medical school to clinical practice, especially long working hours, high workloads and constant new learning and assessment.

  • Countries like the UK and US conduct routine surveys of their doctors in training, led by regulators, to understand their experiences and monitor and report on training quality. However, most low-income and middle-income countries (LMICs) do not have similar routine surveys and there is a relative lack of research on internship experiences in these countries.

  • With collaborators from 9 LMICs, we developed a 50-item Medical Internship Experience Scale (MIES) based on data from 1646 medical interns and junior doctors from LMICs.

  • MIES is reliable and valid and broadly covers six major constructs, that is, clinical learning and supervision, patient safety, job satisfaction, stress and burnout, mental wellbeing, fairness and discrimination, and this tool could be used by governments, medical schools and regulators to compare internship experiences across different training facilities and to identify specific areas where improvements are needed.


Medical internship is the period when doctors in training transition from medical education into clinical practice typically before they become licensed and registered as independent medical practitioners. The programme of internship may differ across countries: in the UK, Modernising Medical Careers was introduced in 2005 to create a 2-year Foundation Programme1; in many low-income and middle-income countries (LMICs), internship is a 1-year stand-alone programme leading to licensure2; while for countries like the US, internship usually refers to the first year of residency where qualified doctors undertake graduate medical education to obtain a license for a chosen specialty.3

Despite the different terminology in each setting, interns often work long hours while learning and being assessed.3 Interns also need to shift their identity to that of physicians and take on new responsibilities and challenges. Some may lack confidence in their ability to manage uncertainty when responsible for others’ lives especially when facing sudden patient deaths.4–7 Such factors can result in rapid burnout, stress and other mental health problems.4 8 9 In LMICs due to resource constraints, interns may also experience low availability of essential medicines and equipment, limited supervision and feedback,10 poor safety climate, extremely poor working conditions and work without pay.11 For example, Erasmus described interns in South Africa as ‘slaves of the state’.12

Countries like the UK and US conduct routine surveys of their doctors to understand their training experiences and its quality, led by the General Medical Council13 and Accreditation Council for Graduate Medical Education,14 respectively. These surveys are relatively broad, for example, spanning perspectives on teamwork, workload and patient safety. More recently in 2018, the UK General Medical Council added a burnout inventory to its survey.13

In comparison, most LMICs do not have such routine surveys, confirmed by a scoping review of quantitative tools that measure internship experiences.15 Only 14 out of 92 included studies were conducted in LMICs. The review also revealed lack of common definitions of key areas to measure, and substantial variation in the questions in major national trainee surveys,13 14 16 limiting options for comparison across countries. Existing tools from high-income settings might not capture significant differences in context in LMICs: for example, poor infrastructure and material resources availability in internship hospitals17 are common in LMICs. Therefore, we aimed to develop and validate a tool, the ‘medical internship experience scale (MIES)’, focusing on the internship experience of medical doctors in LMICs.

Development of the MIES

We followed Boateng’s 9-step scale development and validation framework for the development of MIES.18 Collaborators from nine countries were involved in different stages of scale development and validation processes (table 1). Eight of them contributed data to a final survey sample alongside some responses from non-study countries. The nine study countries varied in geographical locations, income-level and internship training models, which allowed us to understand and develop a tool suited for use across settings. An overview of the scale development and validation process is provided in table 2, with step-by-step detail provided in online supplemental appendix 1. Information on the nine study countries is provided in table 1.

Supplemental material

Table 1

Overview of the nine study countries for MIES development and validation process

Table 2

Overview of MIES development and validation process

Item development

First, we conducted a scoping review of the tools that measure medical internship experience that is reported elsewhere.15 We summarised the major themes examined by 92 studies and identified three domains of interest: well-being, educational environment and work conditions and environment. We generated an item pool through reviewing existing tools and indicators from the scoping review, adapting five commonly used tools (table 2). We supplemented these with additional questions on physical resources and patient safety. We standardised this initial set of items (n=102) and responses intending that they measure the broad internship experience and are collectively compatible.

We assessed content validity with the target population and experts.18 For target population, we conducted discussions with 43 interns in 7 countries to understand whether the domains and items were relevant to internship experiences. Items were revised, added or dropped at this stage (online supplemental appendix 2). We then conducted validation involving 14 purposefully selected experts on medical training and/or scale development to evaluate items for content relevance, representativeness and technical quality (online supplemental appendix 3). We retained 88 items after this phase (online supplemental appendix 4).

Scale development

The tool was translated into three additional languages (Mandarin Chinese, Vietnamese, and French, online supplemental appendix 4), led by study country collaborators using forward and back translation.19 We then conducted pretesting and cognitive interviews20 21 with 19 medical interns to ensure that the items are meaningful to the target population and that the MIES survey could be successfully administered. Pretesting was conducted using country collaborators’ proposed mode (online or paper), in English or translated language and alongside cognitive interviews. Items were further revised and rephrased at this phase.

We then moved onto the survey administration stage. The analytical sample was collected from eight study countries (excluding South Africa) using a mix of snowballing and purposive sampling approaches as well as through an open survey shared via social media and colleagues (table 1). The study population eligible for the MIES survey was the cohort of medical interns or junior doctors in 2022 who finished internships in 2018 or after. The survey was self-administered by participants either online or using paper-based questionnaires. A total of 1646 complete responses were collected and used as our analytical sample, out of which 113 samples were collected from 14 non-study countries through the open survey. Overall, the mean age of the study sample was 27.8 years. Thirty-nine per cent of the sample were interns at the time of survey administration (on average having completed 7 months internship) and the rest were within 3 years post internship. We acknowledge that our survey sample includes some interns yet to complete their internship and doctors up to 3 years post internship that could influence our findings although further analysis did not suggest any obvious differences linked to time post internship (online supplemental appendix 9). Table 2 presents the characteristics of the sample by country, notably countries like Burundi, China and Fiji had a higher proportion of current interns, and only 13% of Ugandan respondents were current interns as one cohort just recently completed internship at the time of survey.

For selecting the items, no items had an over 10% missing rate for the overall sample; therefore, we kept all items and replaced the missing values with the median of each item. We then conducted item reduction analysis using inter-item and item-total correlations as techniques in line with classical test theory. Six items were dropped because of very low correlations with other items, potentially because they were not measuring similar constructs or not fully understood by participants. Details on each of the items tested in the analytical sample could be found in online supplemental appendix 5.

We used factor analysis to understand the latent structure of the items.18 After confirming the Kaiser-Meyer-Olkin and Bartlett’s test for the fitness of data for factor analysis, we conducted exploratory factor analysis (EFA) and used scree plots and the variance explained by the factor model and the factor loading pattern to determine the number of factors to retain.18 22 Six factors had Eigen values exceeding one, an inspection of the Scree plot however, revealed a likely break after the third factor. We decided to retain the six-factor solution because it comprised fewer items overall (50 vs 65) and in our opinion had a more intuitive domain structure. Furthermore, the three-factor-solution did not significantly outperform the six-factor solution (online supplemental appendix 6 and 7). We further removed 32 items with cross-loading across three rounds of testing.

Based on these results, and after reading the items, we named the six factors: (1) clinical learning and supervision; (2) patient safety; (3) stress and burnout; (4) job satisfaction; (5) mental well-being; and (6) fairness and discrimination, respectively. Final items and their corresponding factor loading as well as sources are presented in table 3.

Table 3

Final items included for the MIES scale

Scale evaluation

We then moved on to examining dimensionality. We primarily focused on measurement invariance, that is, whether the psychometric properties are generalisable across different subgroups—in our case across different countries. We first ran the confirmatory factor analysis on the overall sample (n=1646) again and in each individual country. The overall sample comparative fit index (CFI, 0.89) was less than satisfactory while root mean square error of approximation (0.05) and standard root mean square residual (SRMR, 0.05) were satisfactory. The results were similar for the 4 countries with over 150 respondents (online supplemental appendix 1).We then conducted multigroup confirmatory factor analysis in countries with over 150 respondents’ (Kenya, Uganda, Vietnam and China sample, n=1182). Due to the less-than-ideal CFI in the overall sample, model fit was moderate for configural invariance (CFI=0.86). This suggests some differences in terms of factor loading across different countries. This could be due to relatively small sample sizes, cross-loading of items, or other reasons. Comparing results such as aggregate scores across different countries therefore requires extra caution as the scales might perform slightly differently in each country.

For reliability, we calculated Cronbach’s alpha to assess the internal consistency of the scale items, which is one commonly used reliability criterion. Cronbach’s alpha for the overall final scale items was 0.95 and ranged from 0.74 to 0.93 for the six identified factors (table 3). The scale-specific and factor-specific alpha results were similar for most countries. We examined the validity of MIES in line with Cook’s recommendation23 (table 2).

Lessons and implications for practice

In summary, we developed and validated a scale to measure the internship experience of medical doctors in LMICs. We tested for reliability, factor structure and validity across four languages and different countries varying in internship training contexts. Fifty items that comprise the scale broadly cover six major constructs, that is, clinical learning and supervision, patient safety, stress and burnout, job satisfaction, mental well-being, and fairness and discrimination. The scale developed is built on several existing tools such as Postgraduate Hospital Educational Environment Measure (PHEEM; 40 items covering 3 constructs of autonomy, teaching and social support),24 Perceived Stress Scale (PSS; 10 items with 1 construct)25 and Professional Quality of Life Measure (ProQOL; 30 items covering 3 constructs of compassion satisfaction, burnout and secondary traumatic stress).26 However, these existing tools include items not specific to internship training and/or not relevant to LMICs.15 We therefore reviewed and revised items through content validity discussions with junior doctors in LMICs, an expert panel and cognitive interviews. Table 3 provides the information on whether the final items are adapted from existing tools or developed as new, for example, items under clinical supervision and learning are all adapted from PHEEM, items under stress and burnout are all adapted from PSS and ProQOL whereas nearly half of the patient safety items are newly added.

Reflection on the multicountry scale development process

Scale development is an iterative, complicated and onerous process, and often requires researchers to make their own decisions in terms of what methods and approach to use. Our scale development was conducted in a multi-country and multi-language setting, which itself presented additional analytical and logistics challenges.

At the item development stage, we conducted rounds of target population content validity discussions and cognitive interviews as well as an expert panel to ensure cross-cultural equivalence of the items. The cognitive interviews were especially helpful in ensuring face and content validity to revise or remove redundant, ambiguous or difficult-to-understand items.20 However, this resulted in a few items being dropped at this stage because they were not universally relevant. This included items on pay and remuneration as some countries do not pay their interns since internship is considered part of preservice education.

We faced logistics issues when surveying across countries as this required use of different data collection platforms (paper-based, REDCap, Microsoft Forms, Wenjuanxing) and languages as well as different considerations in survey advertisement, recruitment and incentives. For example, in Uganda, data were collected online supplemented by field visits to 4 major hospitals in a 3-month period and resulted in a substantial sample size (n=487). In Kenya, we partnered with the professional association, regulator and major medical schools, and advertised through their channels offering 1 GB mobile data as an incentive. Despite a lengthy process over 8–9 months aiming to reach all eligible respondents, we estimate that we received complete responses from approximately 15% of the eligible population. We hoped for a minimum of 150 responses from each country to support invariance analysis. However, Sierra Leone has roughly 50 interns per year and so achieving our proposed sample size would be extremely challenging.

Examples of using the MIES

Our newly developed tool could enable LMICs to assess key human resource metrics for interns and possibly allow comparison across countries. As shown in figure 1 by calculating and comparing the aggregate scores of the six factors (after reversing negatively scored items), we observed a difference in the total scores (all with a maximum score of 5) between the eight study countries. Job satisfaction (factor 4) was perhaps surprisingly rated relatively high across all countries (range from median 3.43 in Nigeria and China to 4.29 in Uganda) whereas stress and burnout (Factor 3) might be considered concerning (range from median 2.30 in Nigeria to 3.40 in China). While these data suggest differences between countries, further testing of the psychometric properties of this tool and improved representativeness of sampling would be important to confirm findings. Further illustration of the potential value of MIES is shown in table 4. Ranking those items scored lowest and highest from the global and country-specific samples suggests, other than China and Vietnam, that stress and well-being are of concern with interns unable to balance their work and personal life and constantly stressed. Such results indicate that improving interns well-being and relieving workload and stress is an urgent agenda in several countries.27 28 Interestingly, interns were generally proud of their work, believing they could make a difference. However, responses from China and Vietnam may suggest that efforts are needed to increase junior doctors’ job satisfaction and sense of personal accomplishment. The low job satisfaction might be attributed to broader issues such as tense doctor–patient relationship and threatened professional identity as seen in China.29–33

Figure 1

Aggregate score by factors in selected countries. All ‘negative’ items have been reversed and higher score in each factor is more favourable.

Table 4

Questions/items that were ranked lowest and highest based on responses to all six factors and using ‘global’ data from all countries

As many countries face workforce challenges, we suggest that governments, medical schools or licensing bodies could use the MIES tool to assess internship experiences in a country, potentially across different training facilities to identify areas where improvements are needed. For example, our preliminary data suggest significant differences in interns’ experience of patient safety and job satisfaction between facilities of different sizes in Kenya (online supplemental appendix 8). The tool could also be used to track changes in internship experiences over time, perhaps administered as an annual internship exit survey, or to explore cross-country differences. It also has the possibility of being linked with other datasets such as workforce registries, similar to the annual national training survey and UK medical education database, to inform workforce planning and support education and training regulation.34

Additional considerations for others using the scale

We opted for a six-factor structure for MIES instead of three factor. Both versions have their pros and cons as they performed similarly in subsequent dimensionality and reliability testing and both had a slightly poor model fit that is common in studies running confirmatory and EFA in the same sample.35 The six-factor version has 50 items, is shorter despite having more domains and therefore easier to implement. We also felt its domain labels were intuitive, perhaps helping prompt appropriate actions. One advantage of the three-factor version might be that it includes items related to resource availability. In some settings, such items may be important to add, for example, ‘the internship hospital has adequate supply of diagnostics, equipment and medication for my study and work needs’ (Q88) and ‘the internship hospital has good quality accommodation for me when on call’ (Q85). We provide further information on the 3 and 6 factor domains and items, and the original 88 items (online supplemental appendix 6). Countries designing internship surveys could add in further questions of specific interest such as those on remuneration and career intentions as stand-alone issues.

One key limitation is the measurement equivalence of the scale across different countries. According to the confirmatory factor analysis and the multigroup confirmatory factor analysis, configural invariance is less than ideal suggesting that the factor number and loading pattern could be slightly different across countries. This could be due to various reasons including relatively small sample sizes and cross-loading of items. While we conducted content validity discussions and cognitive interviews in different countries to ensure items are similarly understood, some items might have been interpreted differently in different countries. Second, we acknowledge that different data collection approaches in our study countries could lead to potential biases, for example, two countries used a paper-based survey and some countries only sampled participants from major hospital sites due to logistics considerations. Third, despite reducing the item number to 50, the tool is still relatively long and would require 15–30 min to answer, which could be a disadvantage for busy interns leading to survey dropout. We also acknowledge that despite having nine countries with varied geographical locations, income-level, and internship training models in the scale development process, there could be other context-specific factors influencing the internship experiences in other countries. Future studies should explore other types of validity and reliability of this tool including test–retest reliability, psychometrics in other countries and perhaps conduct further testing to produce a shorter form of the tool.


In conclusion, we developed and validated a scale to measure the internship experience of medical doctors in LMICs. We tested for its reliability, factor structure and validity across four languages and eight different countries varying in internship training contexts. The final six-factor scale includes 50 items that broadly cover six major constructs, that is, clinical learning and supervision, patient safety, job satisfaction, stress and burnout, mental well-being, fairness and discrimination. This tool could be used to inform better internship planning and management especially improving junior doctors’ experiences during internship.

Data availability statement

Data are available upon reasonable request. The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

Ethical approvals are issued by Oxford Tropical Research Ethics Committee (OxTREC 563-20 and OxTREC 518-21) as well as from individual study countries that contributed to the survey data collection (as listed in Table 2). Participants gave informed consent to participate in the study before taking part.


We thank Kathanruben Naidoo and Reshania Naidoo (University of KwaZulu-Natal) for their assistance in the initial stages of scale development in South Africa, and Arnaud Iradukunda and (Kamenge Teaching Hospital, University of Burundi), Fine Ineza Nsabiyumva (Hope Africa University) and Raoul Ndayiragije (University of Ngozi) for their assistance of data collection in Burundi.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Handling editor Seema Biswas

  • Twitter @YingxiZhao, @kwarshak2

  • Contributors YZ, DG, CN and ME designed the study. YZ, SJ, PKL, YKK, DM, NM, MN, NTBP, SQ, KT, RT, BW and FZ contributed to data collection in study countries. YZ oversaw data collection, conducted statistical analysis and wrote the first draft of the manuscript. RS advised on scale development and validation. All authors provided critical feedback on the first draft of the manuscript, read and approved the final manuscript. Aside from YZ, DG, CN and ME, all other authors are listed alphabetically in order.

  • Funding This work is supported by an Africa Oxford travel grant (AfOx-209). YZ is supported by the University of Oxford Clarendon Fund Scholarship, an Oxford Travel Abroad Bursary and a Keble Association grant. ME is supported by a Wellcome Trust Senior Research Fellowship (#207522). CN receives funding from the Economic and Social Research Council [grant number ES/T008415/1]. National Institute for Health Research Applied Research Collaboration Oxford and Thames Valley at Oxford Health NHS Foundation Trust. Consortium iNEST (Interconnected North-Est Innovation Ecosystem) funded by the European Union NextGenerationEU (Piano Nazionale di Ripresa e Resilienza (PNRR) – Missione 4 Componente 2, Investimento 1.5 – D.D. 1058 23/06/2022, ECS_00000043).

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.