Table 2

Overview of MIES development and validation process

Phase 1: Item development
Step 1: Identification of Domain and Item Generation: Selecting Which Items to Ask
  • Primarily a deductive approach, domain identified and items generated from a scoping review on existing tools that measure medical internship experience (Patient Health Questionnaire-9,19 Perceived Stress Scale,20 Professional Quality of Life,21 Postgraduate Hospital Educational Environment Measure22 and Safety Attitude Questionnaire23) with additional questions focusing on challenges most common to LMICs such as physical resources and patient safety

  • 102 items all standardised to 5-point Likert scale, either ‘very often – often – sometimes – rarely - never’ or ‘strongly agree – agree – neutral – disagree – strongly disagree’

Step 2: Content Validity: Assessing if the Items Adequately Measure the Domain of Interest
  • Evaluation by target population: discussion with a total of 43 medical interns in 7 countries to understand whether the domains and items represent the actual experience from interns. Items revised based on interns’ feedback

  • Evaluation by experts: discussion with experts who were either clinicians with responsibility for training/supervision of interns, and/or researchers who have familiarity with survey/scale development processes (n=14) to rate item for content relevance, representativeness, technical quality from 1 (not relevant), 2 (low relevance/needs major revision), 3 (medium relevance/needs minor alteration) to 4 (high relevance). Item-level content validity index (I-CVI) were calculated and 17 items with a below 78% I-CVI dropped or revised.

Phase 2: Scale development
Step 3: Pretesting Questions: Ensuring the Questions and Answers Are Meaningful
  • Translation into three additional languages (Mandarin Chinese, Vietnamese and French)

  • Cognitive interviews with 19 medical interns in 7 countries for face validity and ensure that the respondents understand items as we intended. Interviews were conducted using a mix of ‘think aloud’ (tell me what you are thinking as you answer this question) and ‘probing’ (what this term X means to you and why you chose that answer) and items were further revised and rephrased at this phase.

Step 4: Survey Administration and Sample Size: Gathering Enough Data from the Right People
  • Survey self-administered online (using REDCap or Microsoft forms) or paper, with sample collected from eight study countries including Kenya, Uganda, Burundi, Nigeria, Sierra Leone, Fiji, Vietnam and China as well as a global survey

  • Study population is the current cohort of medical interns or junior medical officers who finished internships in 2018 or after, identified using a mix of snowballing and purposive sampling approach

  • A total of 1646 complete responses were collected as of January 2023, with additional 77 responses dropped due to missing 10% of scale items

Step 5: Item Reduction: Ensuring the Scale Is Parsimonious
  • No item had an over 10% missing rate for the overall sample, missing data replaced by median due to data skewess and low frequency

  • Classical test theory (CTT) to select items based on interitem correlations. Using a cut-off of 0.3, 6 items were further dropped as they have very low correlations

Step 6: Extraction of Factors: Exploring the Number of Latent Constructs that Fit Your Observed Data
  • Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy (high 0.97) and Bartlett’s test significant for fitness of factor analysis

  • Exploratory factor analysis conducted on remaining factors, with six factors explaining 90% of variance.

  • 32 items were removed due to cross-loading, leaving a final 50-item scale

Phase 3: Scale evaluation
Step 7: Tests of Dimensionality: Testing if Latent Constructs Are as Hypothesised
  • Confirmatory factor analysis on the sample showed CFI (0.89) is less than satisfactory while RMSEA (0.05) and SRMR (0.05) are satisfactory

  • Multigroup confirmatory factor analysis for measurement equivalence suggested that model fitted poor for configural invariance. Therefore, there is difference in terms of factor loading across different countries. This could be due to various reasons including adequate but still small sample size, cross-loading of items, etc

Step 8: Tests of Reliability: Establishing if Responses Are Consistent When Repeated
  • Cronbach’s alpha is a measure of internal consistency; how closely items are related within a group. For the overall final scale items was 0.95 (excellent) and ranged from 0.74 (acceptable) to 0.93 (excellent) for the six identified factors. The scale-specific and factor-specific alpha results were similar for most countries.

Step 9: Tests of Validity: Ensuring You Measure the Latent Dimension You Intended
  • We examined the validity of MIES in line with Cook’s recommendation

  • Content validity of MIES was ensured as the item development process included a scoping review to identify relevant tools, as well as content validity discussions with both the target population and an expert panel

  • Response process refers to how well the respondents’ response aligns with the intended construct, and we used cognitive interviews as part of pilot testing to ensure that respondents understand the items as we designed

  • Evidence on the internal structure of the scale derived from internal consistency and factor structure analysis

  • We do not yet have evidence on relations to other variables and consequences for MIES scale, but the tool could be used in the future to identify hospitals that trained interns with poorer internship experience and further improve internship training environment

  • CFI, comparative fit index; RMSEA, root mean square error of approximation; SRMR, standard root mean square residual.