Original research

Global Dietary Database 2017: data availability and gaps on 54 major foods, beverages and nutrients among 5.6 million children and adults from 1220 surveys worldwide

Abstract

Background We aimed to systematically identify, standardise and disseminate individual-level dietary intake surveys from up to 207 countries for 54 foods, beverages and nutrients, including subnational intakes by age, sex, education and urban/rural residence, from 1980 to 2015.

Methods Between 2008–2011 and 2014–2020, the Global Dietary Database (GDD) project systematically searched for surveys assessing individual-level intake worldwide. We prioritised nationally or subnationally representative surveys using 24-hour recalls, Food-Frequency Questionnaires or short standardised questionnaires. Data were retrieved from websites or corresponding members as individual-level food group microdata or aggregate stratum-level data. Standardisation included quality assessment; data cleaning; categorising of foods and nutrients and their units; aggregation by demographic strata and energy adjustment.

Results We standardised and incorporated 1220 surveys into the final GDD 2017 database, together represented 188 countries and 99.0% of the world’s population in 2015. 72.1% were nationally, 17.0% subnationally, and 10.9% community-level representative. 41.2% used Food-Frequency Questionnaires; 23.4%, 24-hour recalls; 15.8%, Demographic Health Survey questionnaires; 13.1%, biomarkers and 6.4%, household surveys. 73.9% of surveys included data on children; 52.2%, by urban and rural residence; and 30.2%, by education. Most surveys were in high-income countries, followed by sub-Saharan Africa and Asia. Most commonly ascertained foods were fruits (N=803 surveys), non-starchy vegetables (N=787) and sugar-sweetened beverages (N=440); and nutrients, sodium (N=343), energy (N=256), calcium (N=224) and fibre (N=200). Least available data were on iodine, vitamin A, plant protein, selenium, added sugar and animal protein.

Conclusions This systematic search, retrieval and standardised effort provides the most comprehensive empirical evidence on dietary intakes across and within countries worldwide.

Key questions

What is already known?

  • Comparable and standardised global data on intakes of foods, beverages and nutrients relevant to maternal–child health and chronic diseases have not traditionally been available across nations nor key subnational subgroups.

What are the new findings?

  • Through systematic searches and collaboration with investigators worldwide, we retrieved and standardised 1220 surveys of nationally or subnationally representative data on individual-level dietary intakes from 188 countries/territories around the world.

  • Most nationally or subnationally representative surveys were identified in high-income countries, followed by sub-Saharan Africa, Asia, Former Soviet Union, Latin America/Caribbean. Middle East/North Africa and South Asia were more data sparse.

  • Among foods, data on fruits, vegetables and sugar-sweetened beverages were most available; among nutrients, on sodium, energy, calcium and fibre. Data on iodine, vitamin A, plant protein, selenium, added sugar and animal protein were most sparse.

  • Less than one-third of surveys had dietary intake data on infants (age 0 to <2 years), young children (2 to <6 years), school age children (6 to 10 years), older adults (70+ years) and pregnant/lactating women or by education level.

What do the new findings imply?

  • These identified, collected and standardised data in the Global Dietary Database 2017 provide most comprehensive empirical evidence on dietary intakes across and within countries worldwide.

Introduction

Diet is critical to both human1 2 and planetary health.3 4 Comprehensive and reliable evidence on individual dietary intakes in all nations of the world is essential for evaluating diet-related burdens for maternal–child health (MCH) and non-communicable diseases (NCDs), as well as for understanding population-level disparities, food costs and affordability, environmental sustainability, and progress toward key aims. For example, the recognition that global diets are relevant to 12 of 17 of the United Nations (UN) Sustainable Development Goals has led to ongoing planning for the first-ever UN Food Systems Summit, scheduled for 2021.5 Given large potential variation within nations, such data are also crucial to provide empirical evidence of dietary habits in key subgroups, such as by age, sex, socioeconomic status and urban or rural residence. Data on habitual dietary intakes in diverse world regions and subpopulations are also essential to inform the potential impact of acute shocks such as the COVID-19 pandemic.

Unfortunately, little data have been systematically identified, collated and standardised on dietary habits worldwide. Available instruments have assessed food commodities or household expenditures, which are not reflective of individual dietary intakes; for example, the UN Food and Agriculture Organization (FAO) Food Balance Sheets (FBS)6 of estimated national food availability, or the World Bank’s Living Standard Measurement Survey7 and other household expenditure instruments which assess household-level food purchasing (but not foods produced by the household, purchased outside the home or actual individual intakes). These available data sources also do not assess heterogeneity within nations, such as by demographic subgroups that may vary in both dietary intakes and disease risk. Numerous national or subnational health and nutrition surveys at the individual level have been conducted around the world, but these are seldom standardised or comparable across countries, time, dietary factors or demographic groups,8 and many are not publicly available.

To address these gaps, the Global Dietary Database (GDD) project was created to comprehensively identify, compile and standardise individual-level data on dietary factors relevant to health.9 The first iteration, GDD 2010, developed systematic methods to compile data from 527 dietary surveys (including urine biomarker surveys) from 116 countries/territories, representing 88.7% of the global adult population in 2010.10 The GDD 2010 represented a major advancement over previously available data, facilitating novel assessments of global dietary habits, trends and patterns,11–15 burdens of diet-related illness16–19 and diet-related sustainability concerns.20 21 In addition, the GDD 2010, together with its associated systematic characterisation of diet-disease aetiological effects and optimal intake levels,22 formed the foundation for the 2010 and 2013 Global Burden of Diseases Study risk estimates of diet-related health burdens.16 23 These findings confirmed for the first time, for example, that poor diet has overtaken tobacco smoking as the leading cause of preventable death in the world.23

Yet, key limitations remained. GDD 2010 focused on major dietary risks for NCDs, but excluded other dietary components, such as micronutrients, relevant for MCH. GDD 2010 also focused on adults (age ≥20 years), with little data on dietary intakes in children or youth. While GDD 2010 provided the first global data stratified by age and sex within nations, it did not include other subnational stratifiers likely to influence diets such as socioeconomic status or urban versus rural residence. GDD 2010 also identified data sparsity in certain regions of the world. To address these gaps, update available data, and advance the characterisation of dietary intakes worldwide, the GDD 2017 systematically identified, collected and standardised additional dietary surveys, including new information on many more foods, beverages and nutrients; an expanded age focus to include infants, children and adolescents; and further joint stratification by age, sex, education level, urban/rural residence and pregnancy/lactation status within nations. This analysis reports on data availability, and corresponding gaps, of global dietary data.

Methods

Prioritisation of dietary surveys

Our methods for GDD 2010 have been reported.10 14 15 17 24 Briefly, we conducted systematic searches of multiple electronic databases and extensive personal communications with experts and authorities worldwide to identify and obtain individual-level dietary surveys globally. We focused on quantitative data on dietary consumption of 21 foods, beverages and nutrients in 16 age-specific and sex-specific subgroups among adults (age ≥20 years) in up to 116 nations across 21 geographical regions between 1980 and 2010. For GDD 2017, we performed additional systematic electronic searches together with extensive communications with 644 data owners worldwide to identify further public and nonpublic data sources on individual-level dietary intakes. We prioritised nationally or subnationally representative surveys, with a special focus on previously identified data sparse low-income and middle-income countries.

We searched for surveys that collected quantitative dietary intake information on one or more of 54 foods, beverages, nutrients or dietary indices (table 1). These were selected and defined based on evidence for relationships with MCH or NCDs as well as clinical and policy interests in their intakes. We also searched for surveys on four additional dietary factors (animal protein excluding dairy protein, dairy protein, glycaemic index and glycaemic load), but identified too few available surveys to include these in the GDD 2017. We prioritised surveys with individual-level assessments using standardised 24-hour recalls, food-frequency questionnaires (FFQ) or short standardised questionnaires (eg, Demographic Health Survey (DHS) questionnaires). Household-level surveys were considered if individual-level surveys were not available in a country. For assessment of dietary sodium and iron, we also searched for and included biomarker surveys measuring 24-hour urinary sodium excretion or blood haemoglobin concentrations.

Table 1
|
Definitions and units of dietary variables and biomarkers included in the GDD 2017

Searches for published and publicly available dietary surveys

We systematically searched online global and regional databases including PubMed, Embase, and Web of Science, LILACS, African Index Medicus and the Southeast Asia Index Medicus. Search terms included: ‘nutrition’ OR ‘diet’ OR ‘food habits’ OR ‘nutrition surveys’ OR ‘diet surveys’ OR ‘food habits’[mesh] OR ‘diet’[mesh] OR ‘nutrition surveys’[mesh] OR ‘diet surveys’[mesh] AND (‘country of interest’). Additional search terms were applied to refine or expand each country search, as appropriate (online supplemental text S1, online supplemental table S3). Data searches encompassed 207 countries/territories, nested within seven world regions, with special focus on subregions previously identified as data-sparse (sub-Saharan Africa, Oceania, Southeast Asia, South Asia, Central Asia, Latin America/Caribbean, Middle East/North Africa).10 14 Because few identified publications reported dietary intake data according to comparable definitions or stratified by all relevant subgroups, we used published articles to identify data owner contacts. As done for GDD 2010,10 24 data owners were invited to join the GDD as a corresponding member (CM), which involved contributing their expertise and survey data through standardised electronic forms. GDD 2010 CMs were also invited to update previously submitted data with the new age group, dietary factor and other subgroup strata, as well as share any newly collected data. A detailed contact and communication algorithm was used to maximise responsiveness and participation (online supplemental text S2). Each CM registered their survey and its characteristics, completed a data sharing agreement, and uploaded the data as individual-level data (preferred) or stratum-level data based on standardised dietary factor and strata definitions.

Relevant publicly available surveys were identified using systematic database searches for health and nutrition surveys, as well as communication with our global CM network. For each potentially relevant public survey, data codebooks were screened for inclusion, and eligible surveys downloaded with prioritisation of data-sparse regions and nations as well as large (populous) nations.

Survey screening and inclusion

Identified published articles were screened by title and abstract and, for all potentially relevant articles, screened as full text by a single reviewer. A random subset of articles from each database and geographic region were screened by a second reviewer to ensure consistency and accuracy. When published reports did not contain the necessary data format (most often), data owners were invited to become CMs and share their data. Datasets and survey documentation submitted by CMs and those from public surveys were reviewed again by a third reviewer to ensure survey inclusion criteria were met. We prioritised nationally or subnationally representative surveys whenever available. When no such surveys were identified for a nation, we allowed community-level surveys, and then household-level surveys, if these were felt to be representative of the community; that is, such surveys were excluded if focused on special populations (eg, people with specific disease conditions) or cohorts (eg, people of a certain profession or dietary pattern).

Data retrieval and assessment

Standardised protocols were used to identify, extract and analyse data in a systematic and comparable manner. For CM-provided surveys, survey characteristics were retrieved using a standardised electronic form, including data on survey name, country, years performed, sampling methodology, response rate, national representativeness, level of data collection (individual or household level), dietary assessment method and validation, sample size, population demographics (age, sex, education, urban/rural residence, pregnancy/lactation status), and definitions and measurement units of dietary factors. Individual-level microdata were retrieved as SAS, STATA, SPSS v.25 or Microsoft Excel files. Aggregate stratum-level dietary intake data were collected using standardised electronic spreadsheets, including data on stratum sample size and means, SD, and 10th, 25th, 50th, 75th and 90th percentiles of intake for each dietary factor, jointly stratified by age, sex, education and urban/rural strata, as available. Data from publicly available surveys were retrieved using a similar standardised electronic spreadsheet as for CM surveys. Random double checks of data retrievals were performed to ensure correct extraction of publicly available surveys. When the same study collected information across multiple countries, data for each country were separated and counted as a separate survey for reporting purposes.

Data standardisation

Standardisation included data quality assessment, standardised categorisation of foods, beverages and nutrients and their units, aggregation by subgroup strata, energy-adjustment and compilation into a relational database. Each dietary variable was characterised according to a standard definition and units (table 1). Surveys with varying definitions were classified using defined secondary definitions. When multiple days of dietary intakes were collected (eg, diet recalls or records), these were averaged for each individual. Semi-quantitative instruments (eg, FFQs based on a single specified portion size) and short standardised questionnaires (eg, DHS surveys) were converted to standard serving sizes for each frequency category. Household-level data were converted to individual-level intakes within each household using Adult Male Equivalents,25 which accounts for the household composition and differing energy intakes by age and sex of household members. Based on national estimated average requirements26–28 and observed population intakes,29 all intakes were adjusted to 700 kcal/day for ages 0 to <1 years, 1000 kcal/day for ages 1 to <2 years, 1300 kcal/day for ages 2–5 years, 1700 kcal/day for ages 6–10 years, 2000 kcal/day for ages 11–74 years and 1700 kcal/day for ages 75+ years (online supplemental text S3). Individual-level microdata were aggregated into subgroups jointly stratified by age (0–5, 6–11 and 12–23 months; and 2–4, 5–10, 11–14, 15–19, 20–24, 25–29, 30–34, 35–39, 40–44, 45–49, 50–54, 55–59, 60–64, 65–69, 70–74, 75–79, 80–84, 85–89, 90–94 and 95+years), sex, education (≤6 years of education, 6.01–12 years or ≥12.01 years; and for children, head of household’s educational attainment), urban/rural residence, and pregnancy/lactation status, as available. Urban versus rural residence were defined according to each survey’s established definition, due to absence of any single global definition of these factors as well as logistical challenges in aiming to revise each survey’s existing definitions. Education was selected as the most widely available and standardised metric of socioeconomic status, as compared with income or wealth indices which are not always reported similarly or accurately across countries.

Quality control and data management

Data integrity and quality were assessed at each step during survey collection, processing, standardisation and analyses. Duplicate reviews were performed of recorded survey characteristics, demographic variables, dietary definition classifications and unit conversions. To assess for outliers and validity (errors) in reported intakes, plausibility thresholds were defined for each dietary factor, both at the individual level and stratum (eg, group mean) level, based on dietary reference intakes, tolerable upper limits, toxicity ranges and existing regional data on mean intakes in populations (online supplemental tables S9–S14). Any value identified as potentially implausible was reviewed for extraction errors, followed by direct correspondence with the CM or public survey data owners, to detect and correct potential errors. Data remaining implausible after such steps were excluded from final datasets. Results for each dietary factor were further graphed and visually inspected by country, age, sex, dietary assessment method, representativeness and time period, reviewing survey result plausibility and consistency within and across countries.

Data analyses were performed using SAS V.9.4 (SAS Institute), Stata V.14.0 (StataCorp) and RStudio V.1.1.453 (RStudio, Massachusetts, USA). Data files were organised using Microsoft Access 2010 relational database (Microsoft, Redmond, Washington, USA), linking survey characteristics, data owner institutional information, and survey processing and standardisation details. SQL queries were developed to deduplicate data, summarise survey characteristics and calculate survey quality scores.

Modelling and imputation

In addition to identifying, collecting, standardising and disseminating survey-level data, the GDD 2017 uses advanced imputation modelling to account for differences in survey design, representativeness, dietary assessment methods and dietary factor definitions, as well uncertainty in dietary estimates and missingness, to estimate stratum-specific mean dietary intakes jointly stratified by age, sex, education and urban/rural residence (N=240 strata per country year), for each of 188 countries/territories per year between 1990 and 2018. Our modelling methods for GDD 2010 have been reported,11 15 30 31 and the updated modelling methods and findings for GDD 2017 will be the focus of a forthcoming paper.

Patient and public involvement statement

There was not public involvement in the study; we used publicly available or privately held data for the analysis.

Results

Survey identification, retrieval and inclusion

Our new systematic searches identified 3062 potentially relevant published abstracts (figure 1). Of these, 221 eligible surveys were identified from 144 CMs, of which 167 surveys (76%) were retrieved from CMs, analysed, cleaned and standardised by the time of database lock on 31 March 2020. Among 1372 potentially relevant publicly available surveys, 544 eligible surveys were retrieved, analysed, cleaned and standardised. Including the prior surveys identified in GDD 2010, the final GDD 2017 dataset included 1220 surveys.

Figure 1
Figure 1

Flow chart depicting the Global Dietary Database (GDD) process for data identification and retrieval. GDD 2010 refers to the iteration conducted between 2008 and 2011, GDD 2017 new data from corresponding members (CMs) refers to surveys contributed by a data owner during the 2014–2020 iteration, GDD 2017 new data from public sources refers to surveys retrieved from publicly available databases during the 2014–2020 iteration.

Characteristics of dietary surveys

The 1220 surveys were from 188 countries/territories, together representing 99.0% of the global population in 2015 (table 2). About 7 in 10 (70.8%) were retrieved from public sources, and the remainder (29.2%) from CMs. 72.1% were nationally representative (representing 99.6% of the global population in 2015), 17.0% were subnationally representative (81.6% of the global population in 2015), and 10.9% were community level (70.2% of the global population in 2015) (more than one type of survey could be included for individual nations). The great majority of surveys (92.5%) were collected at the individual level. About one-third (36.1%) were collected before 2000, and the rest (63.9%) in 2000 or later. Most (58.6%) were retrieved by the GDD as individual-level microdata, and others as aggregated, standardised, stratum-level intakes.

Table 2
|
Characteristics of 1220 dietary surveys in the GDD 2017

Dietary instruments included FFQs (41.2% of surveys; representing 94.3% of the global population in 2015), 24-hour recalls (23.4%; 78.0%), DHS questionnaires (15.8%; 53.9%), biomarkers (13.1%; 70.7%) and household surveys (6.4%; 15.1%) (more than one type of survey could be included for individual nations) (table 2). Most surveys included data on children and adolescents (age 0–19 years; 73.9%); about two-thirds of surveys (64.5%) included data on adults (age 20+ years). More than half (52.2%) included data that specified urban or rural residence of individuals, including 4.7% urban only and 1.4% rural only. Data on education level or pregnancy/lactation status of participants was available in 30.2% and 11.2% of surveys, respectively.

Compared with CM-provided surveys, public surveys were more often nationally representative (76.9% vs 60.7%), conducted before 2000 (42.0% vs 21.6%) and available as individual-level microdata (65.6% vs 41.6%), and more likely to be household level (7.8% vs 0.8%) (table 2). Many more public vs CM surveys used DHS questionnaires (22.0% vs 0.8%) or biomarkers (17.8% vs 1.7%), and many fewer used 24-hour recalls (11.1% vs 53.4%). Excluding biomarker studies, the median (5th, 95th percentile) number of dietary factors per survey was 6.0 (1.0, 30.0), and was lower in public surveys (3.0; 1.0, 14.0) vs CM surveys (11.0; 2.0, 47.0).

Data availability by world region

The largest number of surveys were from high-income countries (N=386), followed by sub-Saharan Africa (N=210), Asia (N=180), Former Soviet Union (N=158), Latin America/Caribbean (N=135), Middle East/North Africa (N=95) and South Asia (N=56) (table 3). In all regions except South Asia, most surveys were nationally representative. The great majority of surveys used either 24-hour recalls or FFQs in all world regions except sub-Saharan Africa, where nearly half of all available surveys were DHS questionnaires (which assess only children age 0–5 years and their mothers and women of reproductive age, 15–49 years).

Table 3
|
Characteristics of 1220 dietary surveys in the GDD 2017, by world region*

Most surveys from high-income countries and Asia were available as aggregated stratum-level distributions; and in other world regions, as individual-level microdata. The highest number of dietary factors per survey was in South Asia ((median; 5th, 95th percentile) 10.0; 2.0, 26.7); and lowest, in the Middle East/Northern Africa (3.0; 1.0, 47.0).

Data availability by nation

By country, the USA, Japan, Great Britain, Finland and China had the largest number of surveys (N>25 surveys each) (figure 2). Many countries in sub-Saharan Africa, Latin America/Caribbean, Middle East/North Africa, Former Soviet Union, and Asia had less than five surveys, with notable exceptions being China and Japan (mentioned above), Russia (N=22), Iran (N=20), Hungary (N=16), Poland (N=16) and Colombia (N=15) (figure 2). Eligible surveys were not identified for 19 of 207 world countries (Afghanistan, Andorra, Bermuda, Cuba, Democratic People’s Republic of Korea, Djibouti, Equatorial Guinea, French Polynesia, Guadeloupe, Guinea-Bissau, Hong Kong, Macau, Martinique, Netherlands Antilles, Nicaragua, Palau, Réunion, Somalia, Turkmenistan), together representing 1% of the world population in 2015.

Figure 2
Figure 2

Geographical density of the number of dietary surveys in the GDD 2017 by country (A), including publicly available surveys (B) and non-public surveys submitted by data owners (corresponding members) (C). GDD, Global Dietary Database.

Data availability by dietary factor

The most frequently evaluated factors were fruits (N=803 surveys, together representing 96.2% of the global population in 2015), non-starchy vegetables (N=787, 95.8%), sugar-sweetened beverages (N=440, 83.7%), total milk (N=413, 92.1%), unprocessed red meat (N=386, 92.3%), and beans/legumes (N=358, 90.1%) (figure 3). Fewer than 150 surveys were identified for coffee (N=122, 31.3%), cheese (N=121, 59.9%), tea (N=112, 60.9%), whole fat milk (N=95, 30.7%) and reduced fat milk (N=93, 23.5%), although these surveys still represented meaningful proportions of the global population. Among macronutrients, dietary fibre (N=200, 79.1%), saturated fat (N=173, 74.9%), and dietary cholesterol (N=155, 72.4%) had the largest number of surveys. Vitamins and minerals with the greatest number of surveys included sodium (N=343, 81.3%), calcium (N=224, 78.3%), iron (N=113, 55.7%) and vitamin C (N=106, 50.5%) (figure 4). Fewest surveys were identified for iodine, vitamin A, plant protein, selenium, added sugar and animal protein (N<45 surveys each). Notably, with the exception of sodium and calcium, ≥91% of all surveys including information on vitamins, or minerals were obtained from non-public (CM) sources.

Figure 3
Figure 3

Number of surveys with dietary intake data by dietary factor and source in the GDD 2017. GDD 2010 refers to the iteration conducted between 2008 and 2011, GDD 2017 CM refers to surveys contributed by a data owner during the 2014–2020 iteration, GDD 2017 Public Survey refers to surveys retrieved from publicly available databases during the 2014–2020 iteration. CM, corresponding member; GDD, Global Dietary Database; SSB, sugar-sweetened beverage; veg, vegetables.

Figure 4
Figure 4

Number of surveys with dietary intake data on nutrients in the GDD 2017. GDD 2010 refers to the iteration conducted between 2008 and 2011, GDD 2017 CM refers to surveys contributed by a data owner during the 2014–2020 iteration, GDD 2017 Public Survey refers to surveys retrieved from publicly available databases during the 2014–2020 iteration. CM, corresponding member; GDD, Global Dietary Database; w/supplements, with supplements; w/o, without.

Discussion

In this systematic search for individual-level dietary intakes globally, we identified, retrieved and standardised 1220 surveys from 188 countries, together representing 99.0% of the world’s population in 2015. Most were nationally representative and collected at the individual-level using an FFQ or 24-hour recall. A broad range of ages were represented, from birth to late in life; more than half (52.2%) included information on urban or rural residence; and about one-third (30.2%) on educational attainment. Nearly one-third of surveys (29.2%) were not publicly available, retrieved directly from a global network of CMs; and these were more likely to be recent, use 24-hour recalls, and be the primary source of information globally on intakes of vitamins, minerals, and other micronutrients. In sum, this GDD 2017 dataset represents the most comprehensive collection of individual-level dietary intake data worldwide.

Despite extensive searches, we found fewer surveys from South Asia, as well as fewer surveys using higher-quality dietary instruments (eg, 24-hour recalls, FFQs) from sub-Saharan Africa. Surveys in the latter region were most often DHS questionnaires, which collect qualitative information (yes/no) on intakes in the preceding day of a limited number of foods and beverages for infants, children <5 years old, their mothers and women of reproductive age. Unfortunately, countries in these regions have also been more likely to suffer prolonged conflict and economic shocks impacting food security and nutrition,32 rendering even greater the challenges as well as importance of collecting robust nutrition surveillance data.

Less than one-third of surveys each provided data on infants or children younger than 10 years, older adults (age 70+ years), or by pregnancy/lactation status in women. Publicly available (especially DHS) surveys were more likely to report data on young children and pregnant/lactating women, while nonpublic surveys were more likely to report data on older adults. Because nutritional requirements are especially sensitive in these subgroups, our findings demonstrate the global need for more nationally representative high-quality surveys, using 24-hour recalls or FFQs, in these special populations. In addition, while South Asia comprises about a quarter of the world’s population and has the highest global prevalence of stunting and wasting,33 our results demonstrate the fewest number of dietary surveys with data on children (age 0–19 years) in this region. These novel findings highlight specific key gaps in dietary surveillance in this region.

Among different dietary factors, the availability of surveys varied substantially. Generally, more surveys collected data on intakes of foods, especially fruits and vegetables; and fewer surveys estimated nutrient intakes. Importantly, relatively few public surveys reported on vitamins and minerals relevant to MCH such as iron, vitamin A, zinc, iodine, folate and vitamin B12, among others. We found many public surveys used dietary instruments which estimate only a portion of the diet (eg, DHS questionnaires) and thus cannot estimate nutrient intakes; or included broader food assessments but without reliable and updated national food composition databases or food composition tables to estimate nutrients. This was identified to be particularly problematic in sub-Saharan Africa and Latin America/Caribbean, emphasising need for increased collection of individual level, nationally representative dietary data using 24-hour recalls or FFQs, together with creation or updating of food composition data, in these regions.

While 24-hour recalls are considered the best standard for assessing national and stratum-specific mean dietary intakes, their collection is more time and resource intensive than for FFQs, which also may better assess habitual intakes among each individual participant.34 35 Consistent with this, more global surveys used FFQs than 24-hour recalls. Both of these instruments are more valid than short dietary questionnaires, which collect far less detailed data on a handful of foods and beverages, creating more measurement error and far less coverage of the whole diet.36 However, short dietary questionnaires are easier and less expensive to administer, consistent with our findings that they are most common in low-resource nations. The GDD 2017 results highlight the unfortunate irony that micronutrient information is least available and valid where it matters most.

For specific factors, we identified and collected biomarker data but valid biomarkers are not available for most dietary factors.37 We did not use household-level data except where individual-level data were not available, given their significant limitations in assessing dietary intakes.8

Many prior efforts to estimate dietary intakes globally have used FAO FBS as primary data inputs.12 38 While FAO data represent powerful and useful annual estimates of national per capita availability of food commodities, they are not intended to capture and are poor representations of dietary intake.39 For example, they do not capture well, unreported food waste, local food production and especially subnational heterogeneity in dietary habits among different population subgroups.6 40 The Global Burden of Diseases Study is another study that estimates global diet. While the 2010 and 2013 cycles of the Global Burden of Diseases Study collaborated with the GDD 2010 for their global dietary estimates, the Global Burden of Diseases Study subsequently internalised their processes for estimating diets. Based on available publications, that study now uses FAO FBS estimates, national product sales data and household budget surveys as primary data inputs, adjusting using a single global regression against individual-level diet surveys from 67 countries.2 41 Because the relationship between national food availability and individual-level dietary intakes is known to vary significantly and jointly by age, sex, world region and other factors,39 such methods will not sufficiently capture the heterogeneity in national and subnational intakes across diverse countries. Compared with GDD 2017 which includes data on 54 dietary factors, the Global Burden of Diseases Study currently reports on 8 foods, 1 beverage and 6 nutrients. We and others have collaborated with Gallup, Inc., in their planning for potential standardised polling on dietary intakes.42 Such data, mostly likely based on short dietary questionnaires, will not capture the full diet nor estimate nutrients but will complement DHS data and provide useful new inputs to the GDD global dietary modelling efforts.

Overall, while gaps and heterogeneity in data sources are evident, the GDD 2017 represents, to our knowledge, the most comprehensive and updated data on global dietary intakes. To maximise its benefits as a public resource, the GDD 2017 is now available for free public download at http://www.globaldietarydatabase.org. Survey-level information and original data download weblinks are provided for all public surveys; and survey-level microdata or stratum-level aggregate data, as available, are provided for direct download for all non-public (CM) surveys granted consent for public sharing by the data owners (currently 81.9%). Importantly, the full modelled GDD 2017 data, which will leverage all surveys as primary data inputs together with survey- and country-level covariates to estimate the mean intakes of all 54 food and nutrients within each of 240 subnational subgroups in 188 nations by year between 1990 and 2018, will also be available for free public download when finalised (estimated Spring of 2021).9 The GDD is also collaborating with the FAO/WHO Global Individual Food consumption data Tool (FAO/WHO GIFT) project43 and European Food Safety Authority to jointly facilitate harmonisation of dietary datasets on a global scale, public dissemination of methods and dietary datasets, and global collaboration and capacity development with dietary data owners worldwide. We hope the individual survey microdata, standardised dietary datasets, and global modelled data will each serve as critical public resources for researchers, health agencies and governments to evaluate national and subnational dietary intakes and trends, diet-related health burdens and disparities, dietary costs and affordability, strains and options for sustainability, corresponding policy and intervention priorities, and strengths and gaps in dietary surveillance. For example, ongoing efforts to biofortify staple foods in low-income nations44 45 will require data on national and subnational (eg, by age, sex, education, rural/urban residence) intakes of those staple foods, as well as on existing national and subnational intakes of the targeted nutrients from other foods, to effectively plan and implement biofortification.

Our investigation has several strengths. We performed systematic global searches for dietary surveys and employed standardised methods for survey and dietary factor identification, retrieval, processing, checking, standardisation and analysis. We searched multiple online databases of published literature and publicly available data, including region-specific databases, with extensive additional contacts of data owners to identify dietary surveys. We focused searches on data sparse world regions to improve the characterisation of diet in these populations and identify remaining dietary surveillance needs. We collected individual-level microdata or aggregate stratified data by key demographic subgroups, providing critical information on dietary heterogeneity within nations. We collected data on 54 foods, beverages, and nutrients, providing the most complete available information on overall diets. To maximise consistency and comparability, we performed standardised data extraction and analysis including data quality assessments, standardised food definitions and units, energy-adjusting intakes to age appropriate levels, and assessment of outliers and plausibility.

Limitations should be considered. Identified surveys used varying designs and instruments. Thus, GDD 2017 followed a rigorous documentation process to detail each survey’s methods and standardisation process to better standardise the data. Due to the breadth and scope of data collected and standardised, we focused on food categories (eg, fruits) rather than individual foods (eg, apples). Yet, food categories have been most often assessed in relation to MCH and NCD outcomes, and individual-level microdata are available in GDD for future assessments of more granular dietary categories. Four identified food categories (poultry, dairy-based desserts, candy and sweeteners, cakes, cookies and other baked goods) were excluded from our original assessment design; we hope to capture these categories in future iterations of the GDD. Not all potentially relevant dietary surveys could be retrieved due to accessing certain publicly available surveys and logistical challenges in contacting and engaging data owners.

In summary, the GDD 2017 identified, collated and standardised 1220 dietary surveys across 188 countries/territories globally, providing a public resource of data on 54 dietary factors in children and adults over time, nationally and subnationally by age, sex, urban/rural residence, education and pregnancy/lactation status; as well as identifying specific gaps for accelerated surveillance.