Article Text

Download PDFPDF

How will the sustainable development goals deliver changes in well-being? A systematic review and meta-analysis to investigate whether WHOQOL-BREF scores respond to change
  1. Suzanne M Skevington,
  2. Tracy Epton
  1. International Hub for Quality of Life Research, Manchester Centre for Health Psychology, Division of Psychology and Mental Health, University of Manchester, Manchester, UK
  1. Correspondence to Dr Suzanne M Skevington; suzanne.skevington{at}


Introduction The Sustainable Development Goals (SDGs) 2015 aim to ‘…promote well-being for all’, but this has raised questions about how its targets will be evaluated. A cross-cultural measure of subjective perspectives is needed to complement objective indicators in showing whether SDGs improve well-being. The WHOQOL-BREF offers a short, generic, subjective quality of life (QoL) measure, developed with lay people in 15 cultures worldwide; 25 important dimensions are scored in environmental, social, physical and psychological domains. Although validity and reliability are demonstrated, clarity is needed on whether scores respond sensitively to changes induced by treatments, interventions and major life events. We address this aim.

Methods The WHOQOL-BREF responsiveness literature was systematically searched (Web of Science, PubMed, EMBASE and Medline). From 117 papers, 15 (24 studies) (n=2084) were included in a meta-analysis. Effect sizes (Cohen’s d) assessed whether domain scores changed significantly during interventions/events, and whether such changes are relevant and meaningful to managing clinical and social change.

Results Scores changed significantly over time on all domains: small to moderate for physical (d=0.37; CI 0.25 to 0.49) and psychological QoL (d=0.22; CI 0.14 to 0.30), and small for social (d=0.10; CI 0.05 to 0.15) and environmental QoL (d=0.12; CI 0.06 to 0.18). More importantly, effect size was significant for every domain (p<0.001), indicating clinically relevant change, even when differences are small. Domains remained equally responsive regardless of sample age, gender and evaluation interval.

Conclusion International evidence from 11 cultures shows that all WHOQOL-BREF domains detect relevant, meaningful change, indicating its suitability to assess SDG well-being targets.

  • sustainable development goal
  • wellbeing
  • quality of life
  • responsive
  • culture

This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial IGO License (CC BY-NC 3.0 IGO), which permits use, distribution,and reproduction for non-commercial purposes in any medium, provided the original work is properly cited. In any reproduction of this article there should not be any suggestion that WHO or this article endorse any specific organization or products. The use of the WHO logo is not permitted. This notice should be preserved along with the article’s original URL. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key questions

What is already known about this topic?

  • Sustainable Development Goal 3 (SG3) aims to ‘… promote well-being for all, at all ages’ through targeting new interventions, but subjective assessments are needed to complete and complement existing objective indices (eg, gross domestic product), and enable accurate conclusions to be drawn in 2030.

  • A cross-cultural assessment—the WHOQOL-BREF— offers a unique subjective, generic quality of life (QoL) measure designed with users, to assess global health and well-being; data exist on 60 000 adults living in 100 countries.

  • Although validity and reliability is established, it is unclear whether WHOQOL-BREF scores respond to changes induced by treatment, interventions and major life events; a systematic review of the literature with meta-analysis examines whether these changes are clinically and socially meaningful.

What are the new findings?

  • When changes in scores for the four WHOQOL-BREF domains were tested (eg, before and after treatment), both physical and psychological QoL domains showed small-to-moderate changes; small changes were found for the social and environmental QoL domains.

  • Nevertheless, the effect size for every domain was significant, indicating that each domain delivers relevant and meaningful change, even when the differences are small.

  • WHOQOL-BREF scores remained responsive, regardless of the sample age, gender and evaluation interval.

Recommendations for policy

  • The unique features and high performance of the WHOQOL BREF make it suitable for monitoring subjective QoL in SDG projects where well-being changes are expected (eg, SDG 3; SDG 9; SDG 17) and also to compare achievements of global targets, nationally and internationally.

  • As WHOQOL-BREF domains detect relevant changes in QoL following treatments, interventions and major life events (eg, earthquake, abuse), practitioners, policymakers and researchers in clinical and social fields can be assured of measurement responsiveness.


When the Sustainable Development Goals (SDGs) superseded the Millennium Development Goals (MDGs)1 in 2015, the United Nations (UN) built on MDG success in meeting the needs of low/middle income countries, and renewed plans for action and change.2 While SDG 3 aims to ‘ensure healthy lives, and promote well-being for all, at all ages’, this forms only part of a broader health and well-being (WB) brief within the SDGs. During the MDG programme, environmental sustainability became a growing concern, as health and WB costs from hazardous events (eg, droughts, flooding) escalated, and these are increasingly linked to climate change.3 In this paper, we inquire how we will know whether the SDGs have significantly improved WB in 2030. Evaluation is a central theme of this work.

Learning from the MDGs

Despite considerable international agreement about MDG aims and interventions to implement them, decisions were inconclusive about which outcome measures should evaluate MDG goals. Effective MDG interventions included training village health workers, free insecticide treatment of bed nets, eliminating basic health service fees in low/middle-income countries, improving access to sexual and reproductive care and offering combination drug therapies for HIV infection.4 One line of thinking is that outcome measures should be intervention-specific, and hence different assessments are needed to address heterogeneous interventions designed to improve quality of life (QoL) and survival.4 Although specific outcome measures are suited to evaluating single, and sometimes similar interventions, their data limit conclusions when outcomes from diverse targets need to be compared. Despite broad agreement that quality, timely and reliable disaggregated data are essential to measuring progress, and recognition that gross domestic product (GDP) alone would be insufficient to evaluate the SDGs, in 2015 the UN2 reported low agreement about which alternative and additional measures to select. Although good generic measures are available to compare many different populations, conditions and settings, rather than endorsing one of these, the UN-SDG panel decided to go beyond GDP by developing new measures of progress; consequently, they proposed subjective well-being (SWB) indicators, for example, positive mood5 (Sustainable Development Solutions Network, p69)5. Without generic subjective data from global cultures, it will not be possible to draw conclusions about progress in 2030 when SDG outcomes are finally and formally evaluated. Is it possible to find out whether ‘anyone has been left behind’3 (ICUN, UNEP, WWF, p3–35)3, without asking about perceptions?

Within the context of this debate, we consider an international, well-developed, multilingual, generic tool, with an established track record. Acceptable and feasible to assess subjective QoL in many cultures worldwide, the WHOQOL-BREF was developed through an international collaboration convened by the WHO, Geneva. The WHOQOL group aimed to advance standards in cross-cultural measurement, and produce a global measure that would complement ‘objective’ indices on standard of living (eg, GDP) and health (eg, mortality rate).6 Multiple language versions of the WHOQOL were simultaneously developed in 15 cultures worldwide, from an internationally agreed protocol which has since been used by a network of developers in around 100 cultures. Endorsement of a suitable subjective measure to evaluate SDG goals will enable ongoing projects to take immediate advantage of evaluating key outcomes from interventions, treatments and major life events, using a common metric.

Requirements for evaluating the SDGs

A good subjective measure for SDG purposes would need to show (a) measurement properties of reliability and validity, at global, regional and national levels, and moreover that its scores respond sensitively to changes in people’s lives; (b) a capacity to detect positive and negative changes, reflecting experience; (c) acceptability to different cultures, and feasibility to use; (d) easy administration and scoring, with scores that are readily interpretable by decision makers and practitioners; (e) a capacity to assess well people, including those receiving health promotion or disease prevention interventions, and unhealthy populations of all types (eg, chronic mental health, neglected tropical diseases, intractable infections, non-communicable diseases, road traffic injuries); (f) that the results can indicate where inequalities between groups exist, so that ‘nobody is left behind’; and (g) standard procedures to enable cultural adaption, translation and standardisation when new language versions are needed, potentially for 7300+ cultures worldwide. Such properties will enable sound comparisons of diverse cultures, for example community happiness in Bhutan; berry picking in Sweden; ‘feeling fed-up’ in Britain. The availability of this type of tool opens up opportunities for ‘invisible’, indigenous groups (eg, ‘first nations’), to give ‘voice’ to their QoL views when engaging with SDG projects, and perhaps for the first time, to communicate them to policymakers.

Although generic economic indices like GDP and population health estimates (eg, disability-adjusted life years) were used in MDG evaluations, no single measure can fulfil this challenging brief. When it comes to SWB and health, there is no substitute for obtaining meaningful information directly from individuals about their personal experiences, and then testing whether the intervention significantly changes it. Although the UN acknowledged the importance of assessing experience in 2015, and designated WB (defined as positive mood) as an indicator for SDG 17 among 100 indicators in total,5 by March 2016, WB had disappeared from the new final list of 230 UN approved SDG indicators.7 Furthermore, all 230 are exclusively objective. Without subjective empirical data, comparing SDG achievements remains partial and therefore contentious. Incomplete data for MDG targets frustrated global policymaking during the planning period leading to publication of the SDG plan in 2015: they reported progress was limited and hard won.2 A global evaluation strategy of using a common metric is urgently needed to record the personal experience of all communities participating in SDG projects in the next 12 years. A standardised subjective measure added to existing population-based indicators would provide a more comprehensive and conclusive evaluation. Without sound data it will be impossible to conclude whether WB has been significantly improved by the SDG programme, wasting more years, and considerable human and financial resources.

WB or QoL?

Although promoting WB is a specified aim of SDG 3, the UN2 did not say whether WB or QoL should be measured alongside health. Conceptual distinctions between subjective QoL and SWB are opaque, raising questions about whether these concepts are identical, ‘nested’ within each other or different.8 Lack of clarity confounds policymakers and practitioners who seek to make informed choices about the best measure to use. QoL assessment is well suited to measuring SDG 3 outcomes, with its reputation for rigorously developed multidimensional instruments that are relevant to mental and physical health. Furthermore, the five dimensions of mood (positive and negative), cognition and evaluation that typically compose SWB9 can be readily mapped onto some of the 25 QoL dimensions assessed by the WHOQOL. New international empirical findings10 strongly suggest that SWB is ‘nested’ within QoL. QoL as measured by the WHOQOL offers a scale with exceptional multidimensionality and conceptual integration, so is highly appropriate to address the very wide range of life qualities addressed by the 17 SDGs.

WB has also been viewed as purely ‘objective’, and assessed by counting material household goods (eg, TVs, cars, house size),11 but some indices show a loose relationship with subjective measures. Over a decade of analysis examining the relationship between GDP and SWB, Easterlin and Sawangfa12 showed that this association is neither strong nor linear, as generally assumed. While SWB in lower income countries does increase as GDP rises, among higher income countries rising GDP has little or no association with SWB,12 illustrating that objective GDP indicators are no substitute for subjective measures of WB or QoL.

Although the history of health-related QoL assessment reveals several high-quality generic measures (eg, European Quality of Life Scale-five dimensions (EQ-5D), Short Form-36 Health Survey (SF-36)), the unique conceptual breadth of the WHOQOL with its 25 facets scored in four domains enables WB to be evaluated for many SDGs, for health goals where WB is intrinsic (SDG 3), and for other SDGs (SDG 11) eg, where QoL evidence could show that people feel physically safer and secure in sustainable cities. In SDG 17, better QoL could strengthen ways to implement and revitalise the global partnership for sustainable development. Multiple dimensions of a profile are especially valuable for unpacking the impact the of QoL dimensions where trade-offs are unknown; for example, action to conserve biodiversity may enhance psychological QoL, and may also conflict with more immediate human needs for adequate housing, better nutrition and being employed (relating to SDG 15, SDG 2 and SDG 16),2 so potentially abrading environmental QoL. Including a quality generic, subjective profile could complement objective economic measures (eg, GDP), and balance the evaluation portfolio.

Advances in QoL assessment

The WHOQOL suite of instruments offers a series of patient and person-reported outcome measures (PROMs) that provide insights into thoughts, feelings and personal experiences relating to QoL and health. The WHOQOL was the first generic, cross-cultural PROM designed explicitly for global use. Focus groups of patients, health professionals and community members were convened in 15 cultures, pooling their proposed concepts and wording for the questionnaire. Due to the way it was developed, it is highly acceptable to users. Acceptability promotes lower survey refusal rates and reduces missing data, which in turn improve the accuracy that decision makers need when drawing conclusions and taking action. Improvements to validity that arise from using PROMs methods have accelerated routine use in randomised controlled trials (see

A strength of the WHOQOL development was the innovatory ‘spoke-wheel’ methodology that was devised by a multidisciplinary, multinational collaboration to simultaneously create many highly equivalent language versions from a commonly agreed concept.13 Issues of national importance to QoL were identified locally, and then appended to WHOQOL-BREF core items, to complete the QoL concept for that country,14 for example, eating and appetite in Hong Kong. This process tailored the evaluation to the culture, and hence resonates with the UN Inter Agency Expert Group policy7 to allow countries to devise culturally appropriate indicators that are not on the approved international SDG list. National progress in SDG targets will be tracked then later aggregated up to regional and international levels.

Over 20 years, a WHOQOL-BREF manual has facilitated around 100 culturally adapted translations globally, completed by over 60 000 sick and well adults. This process potentially enables universal QoL comparisons between cultures as diverse as Assam tea pickers, Wall Street bankers and Peruvian mineworkers, by producing highly equivalent cross-cultural data. Similarly, the WHOQOL-BREF’s globally agreed importance items are clustered and scored in physical, psychological, social and environmental QoL domains. This breadth of dimensions uniquely extends the WHOQOL concept well beyond the physical and psychological, commonly found in health-related QoL measures, as it includes QoL related to social relationships, the environment (eg, physical safety, financial resources, home environment), and an unusual spiritual, religious and personal beliefs component within the psychological domain (see detail in references 6 15 16). These features extend its application to innovative settings beyond health. A profile of scores offers a fine-grained multidimensional analysis, and a priori predictions can be made about which domains and items will likely respond most to the impact of an intervention. Furthermore, the WHOQOL-BREF together with its importance ratings has been recently used at a one-to-one level to assess QoL, and prioritise individual patient healthcare.17 18 This research is underpinned by the WHO definition of QoL: ‘An individual’s perception of their position in life, in the context of the culture and value systems in which they live, and in relation to their goals, expectations, standards and concerns’ (The WHOQOL Group,p. 43)19. Briefly, QoL is subjective and in the eye of the beholder, as it is invisible to observers; hence, it is best judged by the person themselves.20

The WHOQOL-BREF results are an aid to policymaking and allocating budgetary resources in the US State of Connecticut Department of Mental Health and Addiction Services (DMHAS).21 Since 2007, 14 000+ adult mental health service users have been invited to complete the WHOQOL-BREF annually, which takes about 6 min in English. This evidence is disaggregated by service, demographics and health, enabling policymakers to prioritise service delivery to disadvantaged groups with the aim of redressing QoL deficits. Following a 3-year pilot, DMHAS annually allocates budgetary resources to promote effective interventions, and make service improvements,21 illustrating how disaggregated WHOQOL-BREF data (eg, by age, education, gender, culture) can be used to reduce inequalities. It would therefore be valuable for delivering SDG 3 targets to relevant groups, being eminently suited to solving global policy issues.

The importance of responsiveness

Global psychometric properties of the WHOQOL-BREF show that scores are reliable and valid across diverse cultures,15 and international reference data are available for gender, age and health status groups.16 For some countries, national norms are available (eg, Australia).22 Other studies show that the scores distinguish sick people from well, and across a wide range of physical and psychological disorders, and social and environmental conditions;23 this is evidence of good discriminant validity.

Among key psychometric properties, responsiveness (also sometimes referred to as sensitivity to change) is perhaps the most important requirement. In clinical healthcare and interventions, responsiveness shows the degree to which score changes are clinically relevant. Responsiveness is the ability of a measure to detect clinical change, particularly where differences in outcomes occur, including small ones. Responsive instruments have scores with the capacity to change sensitively and appropriately, in accordance with the person’s condition or situation, and this information needs to be interpreted. However, there are many different ways of measuring responsiveness. In a longitudinal study of depressed patients in primary care in six cultures who received anti-depressant medication over 9 months, changes in WHOQOL-BREF domain scores were examined in relation to changing depressive symptoms to see how much QoL scores would increase if depression improved, and decrease if deterioration occurred.24 Without such measurement information about change and its relevance, the success of an intervention or trial remains unknown.

The systematic review (SR) technique used in this study can assess responsiveness to change by combining and assessing known evidence in meta-analysis. As methods of calculating responsiveness are controversial,25 we followed Norman et al.’s recommendation26 to limit calculations to Cohen’s effect size.27 Through evaluating collective knowledge about WHOQOL-BREF responsiveness, we shall draw conclusions about the clinical and social relevance of this measure. For instance, this information has clinical value when managing long-term treatment for chronic conditions, such as Parkinson’s disease.20

Responsive instruments can then be used to monitor new services, modes of service delivery, the impact of policy changes, reactions to political crises and environmental disasters, such as those outlined as SDGs. For SDG targets, it will be essential to determine whether QoL has improved significantly following planned interventions for instance, providing adequate and reliable nutrition to improve QoL. This question could be investigated in Addis Abba today using the Amharic version of the WHOQOL-BREF, and support action in this low income country. As no other generic global health-related QoL measure has the same capacity to embrace the plethora of situations and conditions outlined in the SDGs, the WHOQOL-BREF should be the subjective measure of choice.

Establishing the responsiveness of social and environmental domains in the WHOQOL-BREF is important to understanding the breadth of QoL impacts from adverse conditions (eg, earthquakes) and afterwards. As yet, the WHOQOL-BREF’s responsiveness has not been comprehensively examined with regard to a variety of situations and populations, despite several studies which assessed outcomes after diverse interventions and major life events. Examples include radiation exposure,28 earthquake survival,29 housing elderly Sichuan Chinese in an earthquake zone,30 Palestinian conflict in Gaza,31 Nigerian refugees,32 women torture survivors,33 slum upgrades34 and poverty in low/middle-income countries.35 By using the WHOQOL-BREF in these challenging fields, this indicates that social and development practitioners view it as a suitable tool.

The ability of instrument scores to respond to changing conditions is perhaps the most important psychometric property of health and WB measures. This property is no more applicable than when monitoring change induced by environmental and social interventions, including many addressing SDG targets. Gathering multidimensional subjective information about how QoL is affected provides much needed information when trying to understand why some interventions succeed, and if they fail, how that intervention might be adjusted to become more effective.

The main aim of the present research was to conduct a systematic review and meta-analysis of global evidence on the responsiveness of the WHOQOL-BREF to change. Furthermore, to assess the relevance or meaning of WHOQOL-BREF domain scores in clinical and social settings.


Search strategy

Keyword searches were conducted through Web of Science/Knowledge, PubMed, Medline and EMBASE, up to May 2017. There were two search filters: the first identified use of the WHOQOL-BREF through the search terms ‘WHOQOL-BREF’ and ‘WHOQOL’. The second filter was for responsiveness, searching the terms ‘responsive’, ‘responsiveness’, ‘sensitivity’ and ‘sensitivity to change’. A grey literature search accessed documents from international governmental organisations (eg, UN; WHO); a hand search of government organisations was conducted.

Inclusion/exclusion criteria

For inclusion in this review, studies had to (a) assess responsiveness of the WHOQOL-BREF by reporting measures for all four domains, (b) include adult participants (adult was defined by the study culture), (c) include data (or provide data on request) that could be used to calculate Cohen’s d, (d) score the WHOQOL-BREF according to manual instructions, (e) provide a report in English or French (or translation) and (f) apply a longitudinal or repeated measures design (recommended by the Scientific and Advisory Committee of the Medical Outcomes (SACMOS) Trust).36 Studies with the following designs were included: (a) pre-intervention and postintervention/treatment compared with control, (b) postevent compared with a control group who had not experienced this event, (c) pre-intervention and postintervention/treatment with no control, (d) pre-event and postevent with no control, and (e) change over time as an event/illness progressed.

Studies were excluded if (a) they had not assessed WHOQOL-BREF responsiveness for all four domains; (b) participants were not adult; (c) they did not report (or provide) sufficient information to calculate responsiveness; (d) WHOQOL-BREF scoring and transformation did not follow official procedures; (e) translations other than English or French were unavailable; and (f) a longitudinal or repeated measures design was not used.

Moderator variables

Information about potential moderator variables was extracted for each study. These were: characteristics of the sample (ie, mean participant age, percentage of females), and study characteristics (ie, the interval of time between baseline and follow-up assessments).

Coding reliability

All data extraction was completed independently by both authors. Where disagreements arose, these were resolved by discussion between the two, until 100% agreement was reached.

Meta-analytic strategy

The effect size used in the analysis was Cohen’s d.27 Values of effect size were interpreted according to Cohen’s threshold criteria: <0.20 is interpreted as trivial, 0.20 to 0.50 as small, 0.50 to 0.80 moderate and >0.80 large. For independent samples, means, SDs and Ns at pre-intervention and postintervention were used to calculate effect size. If only postintervention means, SDs and Ns were reported, these were used. For repeated measures samples, means, SDs, N and correlation between two time points were used. If these data were unavailable, mean change, N and t value were used, or mean change, N and P value.

A random effects model weighted by sample size was used to calculate the overall effect size (d), the CI, the significance of heterogeneity (Q) and the extent of heterogeneity (I 2 ). The metan command was used in STATA V.11, for overall effect size,37 and the metareg command was used to explore moderators.


Study selection

Figure 1 shows the flow of articles in the study. The literature search identified 117 articles. After eliminating duplicates, this left 83 articles for screening. After assessing the full text of 59 articles for eligibility, a total of 15 papers containing 24 studies were included in the final review. Key reasons for exclusions were they (a) presented responsiveness for WHOQOL measures other than the WHOQOL-BREF (eg, WHOQOL-100) (n=4); (b) collected WHOQOL-BREF data as the ‘gold standard’, but only reported the responsiveness of other measures (n=12); (c) presented reports in languages without a translation (eg, Chinese, Korean) (n=3); (d) did not report results for all four domains (n=5); (e) did not provide sufficient data to calculate effect size (n=5); (f) showed scoring problems which could not be rectified (n=6); and (g) provided literature reviews without useful data (n=9).

Figure 1

Flow of papers through the study. QoL, quality of life.

Characteristics of included studies

Details of the included studies are shown in table 1.

Table 1

Sample characteristics, study characteristics, study design and effect sizes

Sample characteristics

Reports on the WHOQOL-BREF responsiveness are presented historically in table 1, starting from the year 2000. Studies were conducted in a total of 11 cultures worldwide: Europe (n=15), East and South Asia (n=6), Australasia (n=2) and South America (n=1). Samples included a range of illnesses and disabilities categorised as physical health (n=17), mental health (n=3) and well (n=4). They included inpatients, outpatients, primary care, preventative care, social care and the community. The mean sample age was 50.26 years (SD=13.55), and included 60.46% women (SD=23.37). Study sampling was quasi-experimental with allocation to naturalistic groups, randomised and controlled groups, ‘convenience’ samples and structured survey populations.

Study design

Studies varied in design: pre-intervention and postintervention data with no control (n=18); pre-event and postevent with no control (n=3); change after an event, compared with a control group without experience of the event (n=2); and pre-intervention and postintervention compared with a control (n=1). SACMOS36 information categorised the type of responsiveness in each study, and is recorded in table 1 (right-hand column).

Study characteristics

The mean interval between collecting the baseline measures/intervention and follow-up, was 33.19 weeks (SD=51.15).


Across 24 tests (n=2084) the WHOQOL-BREF showed small-to-moderate responsiveness in detecting differences over time or health status in all four domains: physical (d=0.37; CI 0.25 to 0.49), psychological (d=0.22; CI 0.14 to 0.30), social (d=0.10; CI 0.05 to 0.15) and environmental QoL (d=0.12; CI 0.06 to 0.18) (see table 2 and online supplementary figures S2-S5 for forest plots: Physical (figure 2); Psychological (figure 3); Social (figure 4); Environmental (figure 5) QoL. These results suggest that the WHOQOL-BREF is responsive across all the domains, and across a wide variety of interventions and events.

Table 2

Effect sizes for the four WHOQOL domains

As there was significant heterogeneity for each domain (see table 2), moderation analyses were conducted. No differences in responsiveness were found due to the sample characteristics of age and gender, or for the study characteristic of the interval between baseline and follow-up assessments (see table 3). This suggests that the WHOQOL-BREF is equally responsive regardless of the age and gender of the population studied, or the time interval between the intervention/event and measurement.

Table 3

Potential moderating effects of the responsiveness of the WHOQOL-BREF


A systematic review of the global literature and meta-analysis of results confirmed that all four WHOQOL-BREF domain scores change significantly over time in response to an intervention, treatment or major life event. Across 24 studies containing 2084 adults living in 11 diverse cultures we found small to moderate effects for the physical and psychological domains, and small effects for the social relationships and environmental QoL domains. Furthermore, WHOQOL-BREF domains were found to be responsive to change, regardless of the sample’s mean age, its gender composition, and the elapsed time between baseline and follow-up assessments. These findings demonstrate that WHOQOL-BREF scores are responsive to change across wide cross-cultural variations in local healthcare, diagnostic criteria and treatment delivery.

We also aimed to assess whether changes in WHOQOL-BREF domains were relevant and meaningful in clinical and social settings. Highly significant effect sizes found in each domain demonstrated that when scores change, these changes are relevant. More importantly, relevant results were found even for the social and environmental QoL domains where the changes had been small. In clinical terms, this means that WHOQOL-BREF domains have the capacity to detect even small changes induced by treatments or events. These positive findings indicate that the instrument shows good performance in measurement terms, and additionally confirms that the WHOQOL-BREF possesses an essential aspect of validity’ (Norman GR, p816)26 among the many other properties previously tested. However, these score changes should also be interpreted in conjunction with subjective perceptions of change during treatment, care or events. This second piece of perceptual information is important when managing the WB of individual patients or clients. Normative country data can also assist with interpretation. More recent country data have been collected by some national WHOQOL centres since 2004,15 and this reference data should be requested from the WHOQOL country centre concerned, when seeking permission to use a particular language version.

A predominance of physical health conditions among the meta-analysis studies may explain the strongest results for the physical domain. Several mental health studies contributed fairly strong responsiveness evidence to the psychological domain, but psychological involvement connected with chronic physical conditions adds weight to this. As very few studies directly investigated social relationships (eg, women with violent partners),38 and environmental interventions (eg, rehousing schizophrenic patients)23 or events (eg, earthquake survival),39 the apparent weakness of these domains may be due to limited information about them. New research should investigate selected settings where a priori, these two domains of QoL might be expected to respond significantly (positively or negatively) to these types of events. Although cultural variations exist, events that impact on social QoL could include divorce, marriage and bereavement. For refugees, internally displaced people, homeless, indigenous and jobless populations,40 environmental QoL could be salient. A further implication of discovering that this environmental QoL domain is responsive, is that it can now be used to assess environmental WB in many SDGs both related to health and beyond (eg, SDG 9). Public health targets among the SDGs that require SWB assessment will benefit from access to environmental QoL information, and be reassured that it performs equally well as other domains, and across key sociodemographic features.

Small sample sizes limited many studies included in the present SR, so data from large international surveys would improve conclusions. Nevertheless, this heterogeneous, cross-cultural data strengthen global conclusions about generalisability. Although effect size calculations were limited to Cohen’s d, future studies should compare WHOQOL-BREF results across the range of responsiveness indices.

Some potential applications of the WHOQOL-BREF to the SDGs include measuring the impact on QoL of living with HIV, tuberculosis  (TB) or malaria, QoL costs to families from maternal mortality, assessing whether providing universal healthcare improves QoL, improvements to QoL in women from greater access to reliable contraception; the cost to parent’s QoL when a child dies from a disease that is preventable by vaccination (eg, measles), and impact on family QoL of a stillbirth or a baby with spina bifida. For other non-health SDGs, empirical evidence of QoL changes potentially could evaluate population well-being  after a peace treaty (eg, Syria), when social justice is restored (eg, Turkey), if food security is delivered (eg, Ethiopia), after sustainable transport is established (eg, Bangladesh), and as household poverty is reduced (eg, Bolivia).

Agreeing a set of common international items to evaluate all SDG targets remained a major challenge in April 2017, when the list of 230 indicators was disclosed.7 Not one indicator lists SWB or QoL assessment, despite earlier inclusion.5 Due to its generic nature, the QoL dimensions of the WHOQOL could monitor any SDG where people’s QoL perceptions need assessment. Instead of offering a standard core of indicators to be applied by every country and then adding a few culturally appropriate indicators to satisfy local needs, countries are now invited to select from the UN list. Consequently, idiosyncratic sets could be chosen by around 200 countries to address the same target. The resulting data will confound cross-national comparisons in 2030, as common indicators drawn from a standard international core will not be available. Nevertheless, the IAEG7 has built in some flexibility of cultural adaptation; a strategy that will satisfy countries who feel constrained by international demands for uniformity. This IAEG process has synergy with WHOQOL procedures for generating ‘national items’ on important local issues that are assessed with the core WHOQOL-BREF measure; such adaptability is yet another feature to recommend its use in the SDGs.

A spectrum of affordable technological devices and internet communication networks now enables subjective QoL information to be collected remotely in many countries. Mobiles, computers and ‘smart’ phones are almost ubiquitous, but cheaper new devices will replace them by 2030. National participation in the SDGs depends on having local expertise in computing, electronics and statistics2; administration methods must necessarily be culturally acceptable and affordable in GDP terms, and may need additional resources from external development investors.2 Strategies developed in culturally appropriate ways can support vulnerable groups who might otherwise be overlooked. Lack of ‘know how’ may be most acute in low-income countries and minority communities, and paradoxically, it is these sectors that have the most pressing need to bring information about their QoL to the attention of global policymakers. Ensuring that this situation is avoided will be key to SDG aims.


A 20-year track record of advanced cross-cultural research on the WHOQOL-BREF has resulted in a short, state-of-the-art measure with high-quality measurement performance. We show how WHOQOL-BREF domains demonstrate responsiveness to clinical and social change that is relevant and meaningful, and consolidates knowledge about its validity. As this versatile tool can assess adult QoL in many cultures and conditions, it is highly suitable for use in the many different contexts addressed by SDG targets, where knowing about changes to WB will be essential to comprehensive evaluation.

Supplementary file 1

Supplementary file 2

Supplementary file 3

Supplementary file 4


We extend our gratitude to Prof Norman Sartorius and colleagues in the WHOQOL Group, for long-term collaboration and support. We thank the UK research teams acknowledged in Skevington and McCrate (2012), who generously contributed their study data. We also thank Dr Irina Todorova and Dr Adriana Banozic for discussion on the Sustainable Development Goals, and Dr William Taylor for supplying data for effect size calculations (Taylor et al, 2004).


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.


  • Handling editor Seye Abimbola

  • Contributors A preliminary systematic review and manuscript draft was carried out by SS. TE contributed to the final systematic review, conducted the meta-analysis and presented its results. The authors jointly wrote the final draft.

  • Funding The research was conducted by the authors while employed by the University of Manchester.

  • Disclaimer The author(s) is(are) staff member(s) of the World Health Organization. The author(s) alone is(are) responsible for the views expressed in this publication and they do not necessarily represent the views, decisions or policies of the World Health Organization.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement The information for this systematic review is already widely available in publications and other documents.