The influence of gender and ethnicity on facemasks and respiratory protective equipment fit: a systematic review and meta-analysis

Introduction Black, Asian and minority ethnic (BAME) people are disproportionately affected by COVID-19. Respiratory protective equipment (RPE) has conventionally been developed for a predominantly white male population that does not represent the healthcare workforce. The literature was reviewed to determine the protection offered to female and BAME users. Methods Five databases were searched. Eligible studies related to respirator fit in the context of anthropometrics, gender and ethnicity. Meta-analysis was performed for gender-based anthropometric differences. A priori protocol registration was not performed. Results 32 studies were included and anthropometric data was extracted from 15 studies. Meta-analysis revealed 14 anthropometric measurements were significantly smaller for females. Mean differences ranged from 0.37 mm to 22.05 mm. Gender-based anthropometric differences did not always translate to lower fit factor scores, with 12 studies reporting worse performance and fit for females and 10 reporting no gender effect. No studies provided disaggregate anthropometric data by ethnic group. Pass rates (PR) were low or moderate in 12 BAME or mixed-ethnicity cohorts. 14 studies reported associations between facial dimensions (FD) and respirator fit. Three comparative studies showed lower PR among selective BAME people. 18 studies reported RPE performance differed with model and design. Most studies did not prespecify inclusion/exclusion criteria. Small sample size and lack of justification or power calculations was a concern. Significant heterogeneity in study designs limited comparisons, particularly relating to respirator selection or availability and defining study outcomes relating to RPE performance. Conclusion The literature reports on largely Caucasian or single ethnic populations, and BAME people remain under-represented, limiting comparisons between ethnic groups. Facial anthropometrics vary between gender and likely between ethnicity, which may contribute to lower PR among females and ethnic minorities, particularly Asians. There is a need for studies including a broader spectrum of ethnicities and for consideration of female and BAME users during RPE development.

Introduction Black, Asian and minority ethnic (BAME) people are disproportionately affected by COVID-19. Respiratory protective equipment (RPE) has conventionally been developed for a predominantly white male population that does not represent the healthcare workforce. The literature was reviewed to determine the protection offered to female and BAME users. Methods Five databases were searched. Eligible studies related to respirator fit in the context of anthropometrics, gender and ethnicity. Meta-analysis was performed for gender-based anthropometric differences. A priori protocol registration was not performed. results 32 studies were included and anthropometric data was extracted from 15 studies. Meta-analysis revealed 14 anthropometric measurements were significantly smaller for females. Mean differences ranged from 0.37 mm to 22.05 mm. Gender-based anthropometric differences did not always translate to lower fit factor scores, with 12 studies reporting worse performance and fit for females and 10 reporting no gender effect. No studies provided disaggregate anthropometric data by ethnic group. Pass rates (PR) were low or moderate in 12 BAME or mixed-ethnicity cohorts. 14 studies reported associations between facial dimensions (FD) and respirator fit. Three comparative studies showed lower PR among selective BAME people. 18 studies reported RPE performance differed with model and design. Most studies did not prespecify inclusion/exclusion criteria. Small sample size and lack of justification or power calculations was a concern. Significant heterogeneity in study designs limited comparisons, particularly relating to respirator selection or availability and defining study outcomes relating to RPE performance. Conclusion The literature reports on largely Caucasian or single ethnic populations, and BAME people remain under-represented, limiting comparisons between ethnic groups. Facial anthropometrics vary between gender and likely between ethnicity, which may contribute to lower PR among females and ethnic minorities, particularly Asians. There is a need for studies including a broader spectrum of ethnicities and for consideration of female and BAME users during RPE development.

InTroduCTIon
There is growing evidence that black, Asian and minority ethnic (BAME) people are disproportionally affected by SARS-CoV-2 (COVID-19). [1][2][3][4][5] Indeed, data from the UK-based Office for National Statistics demonstrates COVID-19 related death rates in BAME communities are four times higher compared with those of white ethnicity. 6 BAME people comprise only 14% of the population in the UK, yet account for 34% of COVID-19-related admissions to intensive care and 35% of deaths. 7 8 Similar trends are seen internationally. 9-11 BAME people comprise a large proportion of workers in essential services, 12 including healthcare, and their over-representation among patients affected by COVID-19 is a growing concern. Among National Health Service (NHS) staff, 63% of COVID-related deaths are of BAME people even though they represent only 20% What do the new findings imply? ► Meta-analysis revealed 14 standardised anthropometric measurements were significantly smaller for females. ► Mean differences in measurements ranged from 0.37 mm for the smallest dimension (nasal root breath) to 22.05 mm for the greatest dimension (bitragion-menton arc). ► Meta-analysis of anthropometrics between ethnicity or of RPE performance outcomes was not possible due to reporting and study heterogeneity. ► There are limitations to the included studies, namely small sample size (n<50), inconsistency of RPE tested across participant cohorts, and risk of bias assessment showed most studies did not prespecify inclusion/exclusion criteria. ► Significant heterogeneity in study designs limits direct comparison. ► Including only English language studies is a significant limitation considering the focus of this review and inclusion of Chinese records in particular may affect results significantly.
of the NHS workforce. 13 14 The effect is likely multifactorial, 4 5 and addressing these ethnic inequalities requires efforts in various aspects, including effective personal protection equipment (PPE) in the workplace. Respiratory protective equipment (RPE) is vital in the prevention of nosocomial viral transmission. Systematic reviews and meta-analyses demonstrate the use of masks can reduce the risk of respiratory virus infection by 80%, suggesting mask use offers significant protection against transmission of respiratory viruses such as influenza, SARS and COVID-19. 15 In the context of COVID-19, mask use has been shown to reduce the risk of infection by nearly 70% among healthcare workers, highlighting the importance of RPE in the current pandemic. 16 European and American safety regulatory bodies such as the Occupational Safety and Health Administration (OSHA) or Health and Safety Executive mandate RPE must meet certification requirements, such as those developed by the National Institute for Occupational Safety and Health (NIOSH), International Organization for Standardization (ISO) or British Standards Institution (BSI). [17][18][19] Certification requires respirators to be fit-tested on participants from a respirator fit test panel (RFTP) comprising subjects with facial sizes representative of the user population. Historically, sizing and respirator certification has been based on the Los Alamos National Laboratory (LANL) standardised adult head shape panels, developed in the 1960s using a US Air Force (USAF) Anthropometry Survey of predominantly white male military personnel. 20 The bivariate RFTP referenced for half-mask respirators uses two facial measurements-face length and lip length (figure 1). With evolving population demographics such as changing body shape and increasing female and BAME representation, the USAF data is no longer reflective of the current American workers. 21 Therefore, NIOSH created a novel anthropometric database. This has been used to update the bivariate panel to include face length and face width as well as identify 10 facial dimensions (FD) most relevant to respirator fit, which defines the principal component analysis model. 22 In the UK, BSI standards have been based on the 50th percentile of four dimensions (face length, face width, face depth and mouth width) of the adult white male face shape (figure 1). 23 More recent panels have included a more ethnically diverse sample group.
Fit testing is used to determine if the facial fit of a respirator is free of significant inward leak. Both qualitative fit test (QLFT) and quantitative fit test (QNFT) are recommended. 19 24 QLFT uses one's olfactory or taste response to an aerosolised solution. QNFT measures the ratio of external aerosol concentration to internal aerosol concentration, to produce a fit factor (FF) score. Definitions and standards have evolved over time, but currently OSHA recommends a QNFT FF score of 100 affords the user adequate protection and is equivalent to a successful QLFT. 24 Suboptimal fit compromises respiratory protection and can be damaging to underlying skin. 25

BMJ Global Health
The relationship between FD and RPE shape determines RPE fit. FD vary significantly between genders, ethnicities and with age, 26 as well on an individual basis. These may influence RPE fit and there is already some, although mixed, evidence that RPE protection varies with gender-based differences in facial dimension. [27][28][29] Certainly, studies of BAME cohorts have yielded particularly low success rates of fit-testing, and similar trends are seen among healthcare workers. [28][29][30] These findings may be important in respirator design and manufacturing processes. While newer RFTPs may be more diverse, they are not necessarily representative of healthcare workers (HCWs) or BAME people. There is growing concern that RPE in current use is inadequate at protecting female staff and those from at-risk BAME communities. 31 The objectives of this systematic review were (1) to compare the anthropometric measurements of users across gender and ethnic groups and (2) to assess the effects of FD, gender and ethnicity on RPE fit and effectiveness as measured by fit-test FF scores, fit-test pass rates (PR) or inward leakage.

MeTHods
The systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. 32 The PRISMA checklist is available in online supplemental appendix 1. A protocol for the review was defined, including inclusion and exclusion criteria but a priori protocol registration was not performed.
search strategy A literature search was conducted using Embase and Medline via Ovid, PubMed, Scopus and Web of Science in April 2021. The search strategy (online supplemental appendix 2) included key terms relating to respirators, face masks or PPE, respirator fit, FD or facial anthropometrics and race or ethnicity. Gender anthropometrics and differences between sexes were found to be discussed in most studies, therefore gender search terms were not applied as these restricted search results. Reference lists of included papers were also screened. Only human studies, reported in English were included and no time restrictions were applied. study selection and eligibility Two authors independently screened the search results for relevance based on title and abstract, and unrelated studies were excluded. Subsequently, both authors reviewed full texts to identify studies meeting the inclusion criteria: human studies of any age/gender/ethnicity, assessing half or quarter size filtering facepiece respirators meeting N95/PPF3 standards. Studies pertaining to full-facepiece masks were excluded as these likely relate to different FD. Both disposable or reusable RPE was accepted regardless of brand, design, models and sizes. Studies relating to qualitative or quantitative fit-testing were eligible. Outcomes related to fit-test FF scores, fit-test PR or inward leak in the context of anthropometrics, gender and/or ethnicity. No restriction for setting were applied nor to participant characteristics such as occupation, ethnicity, race, gender or age. Studies not assessing the effect of at least one of, anthropometrics, gender or ethnicity, were excluded. Non-English language studies were excluded. Findings were compared and differences were addressed by re-review and discussion until a consensus was reached.
outcomes The outcomes of this review were to compare the anthropometric measurements of users across gender and ethnic groups and assess the effect of FD, gender and ethnicity on RPE fit and effectiveness as measured by fittest FF scores, fit-test PR or inward leakage.

data extraction
An initial data extraction pro-forma was piloted on a small number of records, modified as required and confirmed. Data extracted related to study characteristics and outcomes, including study design, study population, participant characteristics (age, gender distribution, race distribution), method of FD measurement, anthropometrics data, RPE type, fit-testing protocol, and outcome measures of differences in anthropometrics and in RPE fit. For meta-analysis, we intended to collate data on anthropometric measurements for gender and ethnic groups as well as disaggregated group FF scores and PR.

Analysis
For systematic review, variables including FD, gender and ethnicity were organised into tables and described qualitatively. Association of variables FD, gender and ethnicity with RPE fit were summarised. Limitations and implications for this review are discussed.
Facial measurement means and associated SD were extracted where possible and a meta-analysis was performed for gender-based anthropometrics. Standardised methodologies for anthropometric measurements were employed by included studies and therefore sufficiently similar for meta-analysis. A random-effects meta-analysis was performed using RevMan. 33 Statistical heterogeneity was assessed by the measure of I 2 . For facial measurements where I 2 indicated substantial heterogeneity (>50%), study methods were reviewed for possible explanations. Studies were assessed for clinical and methodological heterogeneity to identify any outlying studies conflicting with the remaining studies across the 14 anthropometrics. Sensitivity analysis was conducted to determine whether the gender-based differences in anthropometrics are robust. Attempts were made to identify studies contributing to heterogeneity for exclusion. Anthropometrics were suspected to differ between ethnicities, therefore results were reviewed to identify groups of studies with conflicting results based on ethnicity for subgroup analysis.  Disaggregated anthropometric data was not available to allow for ethnicity-based FD comparisons. Due to heterogeneity in study design, outcome measures and reporting, meta-analysis could not be conducted for RPE performance.

risk of bias assessment
The National Heart, Lung and Blood Institute (NHRBI) study quality assessment tools for observational cohort and cross-sectional studies 34 has previously been adapted 35 to assess the quality of studies in the context of anthropometric measurements between gender groups. The NHRBI tool was similarly modified and applied to the studies included in this systematic review based on available guidance from the NHRBI tool.

Patient and public involvement
This research does not directly include patient or public involvement. The aims and questions are informed by national and international experiences of female and BAME HCWs in using RPE during the ongoing pandemic.

resulTs literature search results
Search of the five databases yielded 796 records, with 544 remaining after excluding duplicates (figure 2). Of these, 401 studies were excluded based on title alone and 100 studies based on abstract. These were either unrelated to RPE or pertained to mask-design, methods of fit-testing and other predictors such as facial hair and temporal changes. Review articles and conference papers were also excluded. Full texts were reviewed for the remaining 43 records and a further 12 articles were excluded. [36][37][38][39][40][41][42][43][44][45][46][47] Further detail of reasons for exclusion are shown in online supplemental appendix 3. One additional study was included from screening of references. Therefore 32 articles were identified as eligible for inclusion. 27-30 48-75 Publication year ranged from 1982 to 2021, and all publications were in English. Most studies were published in non-medical journals, largely relating to occupational, industrial or environmental hygiene, ergonomics or health and safety fields. Finally, 15 studies reported anthropometric measurements for meta-analysis. 27 63 72 three studied an Iranian population, 67 71 74 one studied a Taiwanese population 66 and one studied a Latino migrant workers population. 62 Eight studies had populations of mixed ethnicity, 27 59 The most frequently reported anthropometrics are shown in figure 1 which references standardised measurements from the US air force anthropometric report. 76 Fit-testing protocols were in accordance with regulations relevant at the time of study, including ANSI and OSHA standards and in most studies involved quantitative measurement of FF using a PortaCount Plus. Six studies performed qualitative fit-testing 56 58 62 66 67 71 and two assessed inward leak. 50 60 The variety of RPE brands, models and sizes used and fit-testing methods are reported in table 1.
systematic review and meta-analysis findings Study results were compared qualitatively. Comparisons of anthropometrics between gender and ethnicity groups are shown in table 2. Anthropometric data was available for meta-analysis from 15 studies. 27 71 73 Mean differences and 95% CIs for 14 standardised anthropometric measurements are shown in table 3, with complete data and forest plots available in online supplemental appendix 5. A random-effects meta-analysis revealed all 14 anthropometric measurements were significantly smaller for females (p<0.05). Differences ranged from 0.37 mm for the smallest measurement (nasal root breadth) to 22.05 mm for the longest measurement bitragion-menton arc). Heterogeneity was substantial (I 2 >50%) for nine FD. Gender effect was in the opposite direction in one study, with greater face length and face width for females. 71 Sensitivity analysis with exclusion of this study increased the mean difference between genders minimally and improved I 2 by 10% for face length and 6% for face width. No specific study was identified to contribute substantially to heterogeneity across all 14 measurements. Therefore, no further studies were excluded for sensitivity analysis. Separation of studies by ethnicity did not improve I 2 substantially but significantly reduced participant population, therefore subgroup analysis was not performed. Data for anthropometrics of ethnic groups were not available to meta-analyse. Effects of anthropometrics, gender and ethnicity on RPE fit are summarised in table 4, with complete data per study available in online supplemental appendix 6. Disaggregated data for FF scores and/or PR were not available and heterogeneity in study design and reporting hampered direct comparison of RPE fit outcomes between studies.

Qualitative synthesis FD differ with gender
Gender-based anthropometrics were compared by 15 studies (table 2). Overall, 13 studies demonstrated gender differences, with smaller average female FD for most measurements. 27-29 49 51-55 61 63 66 73 Female measurements were reported to range between 91.5% and 92.5% of the comparable male measurements although with significant overlap of 20%-50%. 49 53 Some studies reported no gender differences for nasal root breadth, 51 53 54 nose length, 55 nose protrusion 54 and lip width, 54 lower face length 54 and one reported greater smiling lip length for females. 49 Meta-analysis demonstrated that all anthropometric were significantly smaller for females than males. Differences in nasal root breadth were minimal but still statistically significant (table 3).

FD differ with ethnicity
Ethnicity data was collected by six studies, of which two studies reported anthropometric data and betweengroup differences. An American study with participants from three ethnic groups found significant differences in all facial measurements, except face length. 51 Interestingly, facial measurements were comparable to early studies comprising a largely Caucasian male population. A South African study including four ethnic groups also reported variation between ethnicities. 28 Asian females had significantly smaller facial measurements and black males had greater nasal root breath measurements as compared with their white counterparts. An Australian survey collected information on overall facial shape and nose size/shape rather than anthropometric measurements and reported facial characteristics were strongly associated with racial group. The three remaining studies     were unable to compare anthropometrics between ethnic groups due to small sample sizes. Studies also drew comparisons between their cohorts and those of previous studies (table 2). Studies of various Asian populations reported significantly different FD compared with Caucasian cohorts, with generally smaller and wides faces. Korean participants had wider face width and nose breadth, narrower nasal root breadth and lip width. 53 Chinese and Iranian participants had wider face width and shorter face length 63 67 and Taiwanese participants had overall smaller faces. 66 FD of males from an ethnically mixed South African cohort were also smaller and wider than for Caucasians. 61 Several studies showed skewed distribution of participant FD compared with the American panel FD such that significant proportions of their cohorts lie outside RFTPs. 50 61 66 71 Gender effects on RPE fit Gender-based differences in anthropometrics have not consistently translated to a difference in FF (table 4). Of 24 studies comparing PR and/or FF scores between genders, 13 studies demonstrated significant gender effects. Of these, 11 studies reported higher fit-test failure rates and/ or lower FF scores among females. 28 29 49 52 53 56 63 67 68 71 73 Factors such as facial stubble which hamper RPE performance may reduce fit for males such that PR appear similar between genders, but comparison of only cleanshaven males yielded higher PR than for females. 28 Gender was also reported to account for a higher proportion of variability in FF scores in analysis of variance. Association of FD and leak sites was mostly attributed to gender. 50 Two studies did not compare PR but did demonstrate an association of gender-based FD with leak distribution and greater predictability of FF using gender specific models. 27 50 In comparison, 11 studies reported no gender effects, with similar PR, no effect on FF score or no effect of gender on leak distribution/shape/sizes. 51 74 75 One study reported mixed results with higher PR among males for two of three RPE models but comparable PR overall across all RPE models. 49 A further study reported higher PR among female users. 73 The variable effects of gender on RPE fit may be the result of differences in methodology. Study design was variable, with some studies assessing one model in multiple sizes, multiple models    in one size or multiple models and sizes. For example, PR were higher for males than females for certain mask models, vice versa for others or comparable. 29 68 72 Similarly, PR were higher among males when restricted to comparisons between individual mask models and introduction of multiple models improved overall female PR. 49 ethnic effects on rPe FF scores FF scores were compared between ethnic groups by only three studies (table 4). Differences in facial measurements between three American ethnic groups did not translate to significant differences in FF scores. 51 A South African study demonstrated FF varied with ethnicity but was underpowered to detect significance of these differences. 61 This is supported by a larger South African cross-sectional study which reported, while FF scores were lowest among Asians and variable between ethnicities, ethnicity was not a significant predictor for fit in the logistic regression analysis. 28   their mixed cohorts of predominately BAME participants, using single model/size RPE and multiple brands/sizes RPE, respectively. 28 61 In particular, the lowest PR were seen in Asian females. 28 The largest study, an Australian survey, similarly reported the highest failure rates were among Asian HCWs and the highest PR were among white HCWs. 34 Of studies assessing BAME cohorts, ten have reported particularly low PR with significant variability between RPE models. Studies of solely Chinese or Korean cohorts report low PR when assessing subgroups for gender and certain mask type. While some masks were associated with PR between 60% and 87%, others were successful for only 10%-30% of users. 29

BMJ Global Health
and Iranian studies even found some masks were ineffective for all of their participants. 63 67 74 Masks that are a good fit for Caucasian Americans have been shown to provide adequate fit for only 41% of Latino workers. 62 Additionally, two European studies demonstrate low PR among HWCs, suggesting current RPE may be inadequate, however, the ethnic distribution of these populations was not reported. 30 73 Mask factors affect rPe performance A total of 20 studies compared FF and/or PR between different RPE brands and models; 17 studies demonstrated RPE performance differs significantly based on design. 28- 30 52 53 55 57 59 63 64 66-69 71 72 74 One study reported FF score varied with RPE brand for females only, with no correlation in the male group. 49 A study assessing 18 RPE models however demonstrated the number of models and sizes available is associated with FF, rather than the RPE design itself. 54 risk of bias within studies Quality assessment is presented in table 5. The majority of studies fail to meet criteria three as inclusion and exclusion criteria were not prespecified. The majority of studies also do not provide sample size justifications or power calculations. However, many are still able to meet criteria four as they report on variance or effect estimates, as detailed by the NHLBI assessment tool. Of note, several studies do not meet criteria five as anthropometric data were not collected.

dIsCussIon
Our review demonstrates significant gender-based variance in standardised anthropometric measurements, with significantly smaller female FD for all measurements. Comparing Asian and black/African groups to Caucasians shows differences in facial geometry such as overall face size and nose measurements. With regard to RPE performance, female and BAME participants have generally low FF scores and/or fit-test PR. However, only a limited number of studies included BAME people in RPE fit-testing. Given the limited number of comparative studies available and heterogeneity in study design, we cannot be conclusive in our evaluation of RPE performance in gender or ethnic groups and their associations with specific anthropometric parameters.
BSI recognises anatomical and structural differences between genders. 77 Our review shows that facial measurements included in RFTPs, namely face length, face width and lip width, are smaller for females. This is consistent with a large gender-based anthropometric study. 78 In the context of fit-testing; most studies collected data limited to FD included in the LANL and NIOSH bivariate RFTPs. A limited number of studies collected additional facial measurements, such as nose dimensions, and showed that these features are relevant to RPE fit. Hence, the inclusion of these additional dimensions and their correlation to RPE performance would be valuable in future studies.
ISO has reported differences in facial characteristics between Caucasian, Sub-Saharan and European facial types. 77 Comparisons between Caucasian and black participants demonstrate that the latter have greater protrusion of lips, greater head depth, and shorter, wider, shallower noses. 26 78 Hispanic workers have significantly larger facial features for 14 measurements than Caucasians, with shorter nose protrusion and head length. 26 Asian participants have statistically different dimensions as compared with Caucasians for 16 anthropometric values. 26 However, only a limited number of studies comparatively evaluate the impact of ethnicity on RPE performance.
Furthermore, disaggregated comparisons are lacking for ethnicities outside predominant American groups (Caucasian, black, Hispanic). Often studies categorise participants as 'Other' which includes a diverse group of Central, South and East Asians, even though there are significant anthropometric differences between these groups based on ancestry. 79 80 Our review also includes studies using American RFTPs as benchmarks, which show significant proportions of Chinese, Korean and Iranian participants' facial measurements lie outside the distribution of American RFTPs. 66 71 81 82 Additionally, individuals from Asian and black ethnic groups continue to be under-represented in RFTPs. There appears to be an urgent need to use fit-test panels that account for ethnicity-specific differences.
Gender-based anthropometric differences are associated with RPE performance in about half of our studies, the majority of which demonstrate that female participants have significantly lower RPE performance, need a variety of mask models for successful fit and are more likely to fail fittesting altogether. 27-29 50 52 53 56 63 67 68 The heterogeneity in results is likely related to study design, of which RPE availability and the assortment of models on offer are particularly relevant. First, many studies do not make gender-based comparisons of RPE performance for individual mask models, comparing overall fit-testing success between genders instead. This is based on successful fit-testing with at least one respirator, which fails to account for the higher fit-testing failure rates for individual RPE models among females, therefore reducing gender-based differences in RPE performance. Second, provision of one model in limited sizes or RPE designed as 'one-size-fits-all' fails to cater to smaller FD. Increasing RPE choice improves user success rates and reduces gender-based fit-testing differences. For example, a study demonstrated that inclusion of two additional models accounts for a 20% improvement in female PR. 54 Certainly, several studies included here recommend a variety of RPE should be made available to ensure successful fit-testing. 30 56 58 61 62 65 69 71 74 In practice, implementing a comprehensive fit-testing programme is a financial and logistical challenge. 59 The variety of RPE in different healthcare environments is variable and procurement dependent. It may not be feasible to test HCWs on all available RPE given the time-consuming nature of fit-testing.
Criteria met; ◐ criteria partially met; ◯ criteria not met. Criteria: (1) were aims and objectives clearly stated? (2) was the study population clearly specified and defined? (3) were inclusion and exclusion criteria for being in the study prespecified and applied uniformly to all participants? (4) was a sample size justification, power description, variance or effect estimates provided? (5) were methods of anthropometric measurement clearly described, valid, reliable and implemented consistently across all study participants? (6) were other independent variables clearly defined, valid, reliable and implemented consistently across all study participants? (7) were the dependent variables clearly defined, valid, reliable and implemented consistently across all study participants? (8) is it clear what was used for analysis or to determine statistical significance estimates? (9) results-were basic data adequately described? (10) were limitations of study discussed? Where indicated as 'criteria not met' for criteria, (3) inclusion and/or exclusion criteria have not been specified. Where indicated as 'criteria not met' for criteria, (4) no sample size justification or power calculation has been reported, nor assessment of variance or effect size. Most studies did not report sample size justification or power calculation, but criteria were deemed to be satisfied if variance or effect estimate provided. Anthropometric measurements made from photographs of participants using established landmarks for five of seven facial dimensions. Protection factor scores required to pass not reported. Correlation analysis performed for only a white male subset of the study population. 48 Correlation analysis for facial dimensions and respiratory protective equipment (RPE) fit reported as having been performed but results were not provided as no significant correlations made. 49 The study was underpowered to assess for race. 50 Facial measurements not entirely in keeping with standard anthropometric landmarks and measurements, as judged by included figure. 51 Physical examination and pulmonary function performed but inclusion/exclusion criteria not stated. 29 52 Study population not specified. Some participants did not test all respirator models and were substituted by others with similar face size categories. 54 Once a successful fit test was obtained other models were not tested. The order of masks tested was applied consistently. 56 Results of effect of gender, age and occupation reported only briefly. 57 Data on facial categories collected rather than anthropometric measurements. Respirator for testing was selected by the tester based on observed facial characteristics rather than measured facial dimensions and Los Alamos National Laboratory categories. Once a successful fit test was obtained other models were not tested. Healthcare workers who failed fit testing were not tracked and if returned for second fit-testing sessions were treated as independent events. 59 Two facial measurements were collected only on a small proportion of participants. SD provided but no between group comparisons available. Correlation analysis was not performed between the facial dimensions and fit factor. 61 Once a successful fit test was obtained other models were not tested. 62 Estimate of variance and/or effect size were irrelevant for aims of study to determine if RPE fit of respirator size relates to respirator fit test panel facial size categories. 64 Participants that were not clean shaven were initially included in the analysis which likely skews results given known effect of facial hair on RPE performance. 28 Factors such as facial hair presence was not records, and could influence the difference in fit factors between genders. 73 Anthropometric data not collected. Ambiguous categorisation on ethnicity of participants as South East Asian and non-Asian. 75 Studies report mixed results for ethnicity-based differences in RPE performance. Small comparative studies have demonstrated lower PR for black and Asian females, but with no effect of ethnicity on FF scores. 28 51 61 These studies were likely underpowered to recognise subgroup differences. Studies of Asian populations have consistently yielded higher rates of fit-test failure among Chinese, Koreans, Taiwanese and Iranians, further emphasising the need to consider FD of their population in RPE design. 29 83 Therefore, designing RPE that fit a wide range of demographics is difficult if RPE is permitted to satisfy standards with limited representation.
In practice, poorly fitted RPE hamper work and user safety. 84 85 Widespread concerns around inadequacies in areas of RPE fit-test access, availability and training have been raised. 86 87 Unfortunately, the proportion of female and BAME HCWs affected and the need for personalised RPE has not been quantified. 85 Studies included in this review were not designed to identify modifications during RPE donning, such as excessive tightening of straps or use of adhesive tape which may allow for successful fit-testing but indicate poor RPE fit. Notably, skin damaged is reported to affect 42%-97% of HCWs and ill-fitting RPE may account for higher rates of adverse reactions among BAME HCWs. 83 88-90 Given the lack of data, specific guidance on modification measures are limited from NHS England and NHS Improvement. 91 Modifications during RPE donning many affect RPE efficacy and the presence of facial lesions encourage face touching and mask handling, resulting in inadvertent PPE contamination. [92][93][94][95][96][97] strengths and limitations This is the first systematic review and meta-analysis of the influence of gender and ethnicity on RPE, to the best of our knowledge. Our search strategy and eligibility criteria were broad and have captured a large number of relevant studies. However, we were limited to English-based databases. We excluded studies in Chinese as we were unable to gain access to the data. This is a significant limitation considering the focus of our review and inclusion of non-English studies may affect results significantly.
Inherent associations exist between gender and FD as well as multicollinearity between FD, although these associations were not always clearly accounted for or reported by studies. Meta-analysis showed significant heterogeneity existed for nine FD. Of these measurements, those with small magnitude of effect (ie, smaller differences in measurements) such as nasal root breadth (MD 0.37 mm), nose length (MD 3.64 mm), nose protrusion (MD 2.03 mm) and lip width (MD 2.82 mm) may be less relevant or irrelevant to gender-based differences in anthropometrics. By extension, they may be less relevant to RPE fit.
There was significant disparity in study design and methodology in gender-based studies. Assessment of study design confirmed anthropometrics were collected by standardised methods. Only one study reported conflicting results, with FD greater for females. Exclusion of this study did not sufficiently improved heterogeneity. BAME people have different FD to Caucasians, and it was suspected that heterogeneity may be result of participant diversity. However, subgroup analysis based on ethnicity was not possible as studies measured varying combinations of FD and ethnicity-based grouping reduced sample size such that meta-analysis would not provide meaningful conclusions. Risk of bias assessment demonstrated most studies failed to meet criteria three, relating to use of prespecified inclusion and exclusion criteria. This may contribute to heterogeneity observed in meta-analysis of anthropometrics and differences in conclusions regarding gender-based differences in RPE performance. Several studies failed to account for their sample size through justification, power calculation or estimate of variance/effect. These risks studies being underpowered to detect differences in RPE performance between gender and/or ethnic groups, and may account for the conflicting results. Limited number of studies included ethnically diverse participants with all relevant anthropometrics. Hence, we cannot be conclusive in our evaluation of RPE performance on gender or ethnic groups and their associations with specific anthropometric parameters.

Future research
Successfully fit-testing HCWs is particularly important in the current climate. Future studies addressing the disparity in RPE fit will require a review of how respirators are designed and tested, including use of a relevant fit-test panel. Studies should aim to include a diverse group of participants inclusive of BAME people to better inform future mask design and fit testing performance. Studies should include the provision of a variety of mask models, brands and sizes, denoting modifications made during the donning process, and the fit-test PR for all mask models tested rather than using an overall success rate. Longitudinal studies on how facial anthropometrics influence fit, but also user comfort and adverse outcomes thereafter would be useful to inform mask designs. The future clearly lies in personalising fit-testing with modern technology. For example, three-dimensional facial modelcapture may be used to assess fit in order to reduce time and costs of fit-testing as well as expedite identification of HCWs who need alternative RPE.

ConClusIon
Anthropometric data is key in the design and testing of respirators, and user demographics reflected in respiratory fit test panels may influence the level of protection respirators provide. Facial measurements vary significantly between gender and ethnicity. Our meta-analysis demonstrates women have significantly smaller facial measurements for 14 standardised measurements compared with men. The literature suggests significant differences in anthropometrics between ethnicities, however, minority groups continue to be under-represented in comparative studies and race-based differences could not be established in our study. The effect of differences in facial anthropometrics on respirator fit and effectiveness is less clear. Over half of studies reporting genderbased comparisons in RPE performance report significantly lower PR among females. Three studies report lower PR among Asian or black participants. However, these PR differences are inconsistently associated with absolute FF scores. FD across ethnic minorities may fall outside the parameters of current RFTPs and impact RPE performance. Therefore, RFTPs need to be expanded to capture the distribution of anthropometric data from all ethnicities and RPE development needs to reflect a more diverse group of users.
Twitter Peter Worsley @PeteWors and Ying Cheong @Ycheong1 Contributors The review was designed by JC, CM, PW and YC. The search was conducted by JC under supervision from YC. Records were screened by JC and NA independently. Data were extracted and meta-analysis conducted by JC. JC led the writing of the manuscript for publication with significant contributions from all authors. All authors approved the final article and are guarantors of the study. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. Competing interests None declared.
Patient consent for publication Not applicable.
Provenance and peer review Not commissioned; externally peer reviewed. data availability statement All data relevant to the study are included in the article or uploaded as supplementary information.
supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
open access This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https:// creativecommons. org/ licenses/ by/ 4. 0/.