Article Text

Scoping future outbreaks: a scoping review on the outbreak prediction of the WHO Blueprint list of priority diseases
  1. Nils Jonkmans1,
  2. Valérie D’Acremont1,2,
  3. Antoine Flahault3
  1. 1Faculty of Biology and Medicine, University of Lausanne, Lausanne, Switzerland
  2. 2Swiss Tropical and Public Health Institute, Basel, Switzerland
  3. 3Institute of Global Health, Faculty of Medicine, Université de Genève, Geneva, Switzerland
  1. Correspondence to Mr Nils Jonkmans; nils.jonkmans{at}


Background The WHO’s Research and Development Blueprint priority list designates emerging diseases with the potential to generate public health emergencies for which insufficient preventive solutions exist. The list aims to reduce the time to the availability of resources that can avert public health crises. The current SARS-CoV-2 pandemic illustrates that an effective method of mitigating such crises is the pre-emptive prediction of outbreaks. This scoping review thus aimed to map and identify the evidence available to predict future outbreaks of the Blueprint diseases.

Methods We conducted a scoping review of PubMed, Embase and Web of Science related to the evidence predicting future outbreaks of Ebola and Marburg virus, Zika virus, Lassa fever, Nipah and Henipaviral disease, Rift Valley fever, Crimean-Congo haemorrhagic fever, Severe acute respiratory syndrome, Middle East respiratory syndrome and Disease X. Prediction methods, outbreak features predicted and implementation of predictions were evaluated. We conducted a narrative and quantitative evidence synthesis to highlight prediction methods that could be further investigated for the prevention of Blueprint diseases and COVID-19 outbreaks.

Results Out of 3959 articles identified, we included 58 articles based on inclusion criteria. 5 major prediction methods emerged; the most frequent being spatio-temporal risk maps predicting outbreak risk periods and locations through vector and climate data. Stochastic models were predominant. Rift Valley fever was the most predicted disease. Diseases with complex sociocultural factors such as Ebola were often predicted through multifactorial risk-based estimations. 10% of models were implemented by health authorities. No article predicted Disease X outbreaks.

Conclusions Spatiotemporal models for diseases with strong climatic and vectorial components, as in River Valley fever prediction, may currently best reduce the time to the availability of resources. A wide literature gap exists in the prediction of zoonoses with complex sociocultural and ecological dynamics such as Ebola, COVID-19 and especially Disease X.

  • SARS
  • viral haemorrhagic fevers
  • systematic review
  • geographic information systems
  • mathematical modelling

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key questions

What is already known?

  • The Blueprint list denotes diseases with the potential to cause severe public health emergencies for which there is an urgent need for accelerated research and development.

  • Outbreak prediction has previously been applied with success to diseases such as Rift Valley fever and influenza, for prevention and the pre-emptive implementation of health measures.

  • A systematic review of the outbreak prediction methods of the Blueprint diseases does not exist.

What are the new findings?

  • Explicit predictions of timing and locations of future outbreaks are most often carried out for diseases with strong climatic components such as Rift Valley fever.

  • The current literature in outbreak prediction of Blueprint diseases can be categorised into five domains: spatiotemporal modelling and risk mapping, time series forecasting and regression analysis, internet-based computing and phone-based predictions, narrative and qualitative models, and quantitative probabilistic models.

  • No articles in this review predicted an outbreak of a novel Disease X in 2019, including coronaviruses.

What do the new findings imply?

  • Prediction models of diseases with strong climatic components such as for Rift Valley fever may currently be most appropriate for policymakers in making explicit prediction statements.

  • Predictions based on outbreak receptivity and risk maps for diseases such as Ebola, Lassa and Crimean-Congo haemorrhagic fever identify outbreak risk factors targetable by policymakers.

  • A pressing need exists for research investment in outbreak prediction of Blueprint zoonoses and especially unknown future pathogens such as Disease X in order to tackle new pathogens such as SARS-CoV-2.


In 2015, the member states of the WHO produced a Research and Development Blueprint list of priority diseases, diseases for which, ‘given their potential to cause a public health emergency and the absence of efficacious drugs and/or vaccines, there is an urgent need for accelerated research and development’.1 The list, updated in 2018, includes eight emerging pathogens (box 1). These diseases are RNA viruses with moderate to severe case fatality rates and the potential for outbreaks of severe economic and health consequences.2–8 The BP diseases are zoonotic pathogens, which constitute about 60% of emerging infections in humans.9 Emerging infectious disease outbreaks are incited by an acceleration of urban development, socioeconomic disparities and an encroachment of the natural reservoir, and the number of such outbreaks is predicted to increase.10

Box 1

2018 WHO Blueprint list of priority diseases

➢Ebola virus disease: the Ebola virus outbreak in West Africa in 2014–2016 primarily affected Sierra Leone, Guinea and Liberia. This outbreak caused 28 639 deaths, a loss of 2.2 billion dollars, a substantial loss in private and public sector growth, agriculture production, food security concerns and restrictions of movement, goods and services.70 71

➢Zika virus disease: Zika virus has infected millions in the Americas since 2014 and has caused an increase in medical sequelae in some populations (congenital disease, Guillain-Barré) as well as socioeconomic disparities.72 73

➢River Valley fever (RVF): RVF outbreaks in South and Eastern Africa impacted the economy and public health of multiple countries, leaving behind long-term societal consequences.74

➢Lassa fever: Lassa fever, a severe haemorrhagic fever transmitted through rats, is estimated to infect millions in West Africa each year, with 20% of those patients experiencing severe multisystemic disease.75

➢Nipah and henipaviral disease: Nipah virus, discovered in 1998, is isolated to Malaysia, Singapore, Bangladesh and India. However, Nipah continues to pose a significant threat as a bat-based zoonosis with yearly spillover events, significant morbidity and case fatality rates of up to 100%.76

➢Crimean-Congo haemorrhagic fever: CCHF, a widely distributed tick-borne zoonosis, infects domestic and wild vertebrates as hosts, resulting in severe human disease and high mortality, and poses a continued threat to central Africa, South-Western Russia and central Asia.77

➢Severe acute respiratory syndrome (SARS) due to SARS-CoV-1: a coronavirus causing SARS emerged in 2003 and rapidly spread from Southeast Asia to Canada. SARS causes severe atypical pneumonia, leading to high morbidity, mortality and major economic consequences.78

➢Middle East respiratory syndrome (MERS): another coronavirus, MERS-CoV, emerged in the Arabian Peninsula in 2012 and has been found in Europe, North America and Asian countries with a mortality of about 30%.79

➢Disease X: disease X is a term used to ‘enable cross-cutting R&D preparedness that is relevant for currently unknown diseases’.11 Disease X was included to stimulate research into emerging pathogens, before their formal discovery.

While the Blueprint (BP) list ‘does not aim to predict the next epidemic’,11 its aim is to further the ability ‘to reduce the time lag between the identification of a nascent outbreak and approval of the most advanced products that can be used to save lives and stop larger crises’.1 The most effective method to reduce the time lag between an outbreak and implementation of public health measures may be to anticipate future outbreaks. Analysis of viral outbreaks is complex, frequently relying on a mixture of ground-level epidemiological data, advanced mathematical and statistical analysis and the inherent stochasticity of disease processes.12 While the complexity and randomness in epidemics render prediction challenging, the inherent entropy barrier to prediction is ‘beyond the timescale of single outbreaks, implying [outbreak] prediction is likely to succeed’.13 Furthermore, challenges intrinsic to outbreak prediction should not discourage us from attempting to pre-empt outbreaks. Preventative efforts would involve public health measures such as vaccination, pesticide use and awareness sensitisation.14 COVID-19, similar in certain respects to Middle East respiratory syndrome (MERS) and Severe acute respiratory syndrome (SARS), has brought with it the consequences of a severe global pandemic and demonstrates the need for establishing a research and development pipeline to predict the next Disease X.15 16 As the BP diseases lack a dedicated research pipeline, we are ill-equipped to deal with outbreaks of these priority pathogens.17 The COVID-19 pandemic underlines that pre-emptive implementation of preventive public health measures in areas at risk of future outbreak is of social, health and economic importance. Influenza also demonstrates this point with a successful history of surveillance, forecasting and epidemic modelling enabling prediction of future epidemics, which has guided pre-emptive vaccination efforts and other preparedness measures.18 19

We therefore undertook a scoping review to identify and characterise studies attempting to predict future outbreaks of the BP diseases. A scoping review was selected as the preferred method, as it enables a broader search strategy and research question, which is useful in an ill-defined field of research.20 Furthermore, it allows integration of new findings and research developments during the review process. A frequent goal of scoping reviews is also to map the existing literature, without drawing far-reaching conclusions. Our objective was thus twofold. First, to map the literature concerning outbreak prediction: what aspects of future outbreaks were predicted and what methods and data sources were used to make these predictions. Second, we attempted to synthesise the data in a manner that would be useful for policymakers, in order to inform and highlight prediction methods that may warrant future exploration and implementation.

Aims and research question

The aim of this scoping review was to evaluate and map the scientific literature and identify research gaps concerning outbreak prediction methods of the BP diseases. The question this scoping review will attempt to answer is: what methodologies, model types and data are used to predict future outbreaks? Furthermore, what scholarly consensus exists on predictive models for future outbreaks and how can these be leveraged by current health actors in pre-empting BP diseases, including COVID-19?


The Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) checklist for scoping reviews was adopted.21 A scoping review is a form of ‘knowledge synthesis, [that] follow[s] a systematic approach to map evidence on a topic and identify main concepts, theories, sources, and knowledge gaps’.21 The main author developed the scoping review protocol, the eligibility criteria and the summative data extraction tables.

Literature search strategy

The search strategy was developed by the main author with the help of research staff at the Lausanne University Hospital Library in Switzerland and included a broad range of terms related to outbreak prediction and the blueprint diseases through a combination of free text and Medical Subject Headings (MeSH) terms. The search terms were designed to identify literature that related the prediction of future outbreaks to the blueprint diseases while minimising the identification of literature related to the modelling of current epidemics. Prediction as we define is the attempt to foresee outbreaks of a disease in a location or timeframe in the future during a non-epidemic epidemiological situation. Disease-related search terms were also identified using MeSH terms and their catalogued synonyms from the National Library of Medicine database.22 Disease, forecasting and prediction related search terms were then combined and run in advanced search settings in the respective databases. The full search strategy is detailed in the supplementary material.


Three databases were used to identify relevant literature: PubMed, Web of Science and Embase. Grey literature was not searched. No hand searching was performed. No date limit was applied. Articles were searched up to and including the 4 July 2019. This temporal limitation was applied as we sought to focus on the state of research imminently preceding COVID-19 and to exclude COVID-19’s influence on outbreak modelling of the Blueprint diseases.

Screening, study selection, and inclusion and exclusion criteria

Relevant literature identified through our search strategy was extracted from the aformentioned databases, then transferred to EndNote X9 (Clarivate Analytics) for deduplication. Following deduplication, preliminary article selection was carried out through a modified two-reviewer screening system using Rayyan, an online-based literature screening platform. The main author screened titles and abstracts of all articles identified according to the eligibility criteria below and labelled articles as ‘included’, ‘excluded’ or ‘maybe’. During screening, full-text articles were occasionally retrieved and reviewed if the title or abstract did not provide enough information to decide whether it ought to be included or excluded. A second researcher aided with reviewing articles labelled as ‘included’ and ‘maybe’ and provided expertise in the analysis of statistical/mathematical models. Where conflict between article selection existed, both reviewers debated the articles according to inclusion/exclusion criteria until consensus was reached.

Inclusion criteria

The review considered any studies predicting or forecasting future outbreaks through prediction of outbreak timing and/or location, outbreak risk maps, qualitative or quantitative outbreak risk, other risk assessments and other future epidemiological outbreak phenomena. Prediction in the chosen articles was defined either as an explicit statement by the authors stating ‘we believe at X time and/or Y location in the future, an outbreak of Z disease will occur’ or models containing either a quantitative (eg, percentage likelihood and numerical scale) or qualitative (eg, highly likely vs unlikely) risk of outbreak during a timeframe and/or location in the future. We required studies to denote some form of outbreak prediction in the abstract and title. We also included studies that predicted a future outbreak without mentioning a specific date/size of the outbreak/phenomena. We did not set a threshold degree of certainty for the predictions included. The review considered original quantitative and qualitative studies. There were no restrictions with regard to geographic location, population or study design.

Exclusion criteria

Articles that were excluded contained one or more of the following criteria:

  • Reviews, editorials, viewpoints and letters, duplicate studies and literature with a strong veterinarian focus not linked to public health.

  • Studies solely modelling current outbreaks of Blueprint diseases at the time of publishing, without predicting future phenomena.

  • Studies solely predicting outbreak risk factors.

  • In vivo and in vitro basic science models (eg, vaccine trials and animal models).

  • Purely descriptive epidemiological and ecological publications (eg, serological studies and risk factors) without prediction of future epidemiological changes.

  • Models that only examined causality of Blueprint diseases, rather than estimating risk or burden.

  • Languages other than English, Spanish, French or German.

  • Portable Document Format (PDF) not accessible.

Data extraction, synthesis and abstraction

After initial screening and study selection using Rayyan, the selected list of articles (n=123) was transferred to Papers for literature management, full-text retrieval and data extraction. A primary readthrough of articles was conducted, and data were extracted into a descriptive summative table synthesising study information (online supplemental material 1). Study information included purpose of study, prediction method and key findings answering to scoping question, among other variables (online supplemental material 1). During this full-text analysis, a further 65 articles were removed according to the exclusion criteria. The final list consisted of 58 articles (figure 1). A second round of article readthroughs was then undertaken in order to synthesise and categorise data quantitatively into a numeric table (online supplemental material 1). Definitions of the various variables extracted and the complete quantitative and narrative analyses for each article are presented in the supplementary material.

Supplemental material

Figure 1

PRISMA flow diagram of search strategy. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

A summative table was created highlighting the principal findings (table 1).

Table 1

Principal findings of outbreak prediction articles, by disease


The database search identified 7042 articles. A total of 3083 articles were excluded as duplicate studies. Titles and abstracts of the remaining 3959 articles were screened for inclusion. One hundred and twenty-three abstracts met the inclusion criteria. Sixty-five articles were further excluded on full-text analysis. In total, 58 articles were retained (figure 1).

Most publications concerned RVF (36%), Zika (22%) and CCHF (14%). No publications were produced on Disease X. Ninety-six per cent of articles were published between 2000 and 2019. The most studied region was the African continent (48%). Outbreak prediction and forecasting strategies were mapped into five categories (table 2): (1) spatiotemporal modelling (43%) and risk mapping (45%), (2) time series forecasting (40%) and regression analysis (36%), (3) internet-based computing and phone-based systems (10%), (4) qualitative models (12%) and (5) other quantitative models (16%). Certain articles fell into multiple categories when prediction strategies were combined. Most model types were stochastic (60%) in nature. The most common data types used in predictions were case count (81%), climate/meteorological (67%), vector (53%) and sociodemographic data (41%). Future outbreaks were most commonly predicted by evaluating outbreak risk (62%), spatial (76%) and/or temporal predictions (67%). Future case numbers were predicted in 36% of studies, and 64% of articles concomitantly evaluated outbreak risk factors. A significant portion of articles studied environmental suitability to future outbreaks (34%). Of note, few articles evaluated climate change effects on outbreaks (5%). A synthetic table of the outbreak prediction and forecasting methods highlights our principal findings (table 1).

Table 2

Main outbreak prediction model themes

Spatiotemporal modelling and risk mapping

The most common outbreak prediction methods were risk mapping (45%) and spatiotemporal modelling (43%). Most often, disease–environmental relationships predicted future outbreak location through climatic colayers and vector–host data.23 Many articles (n=16) applied machine learning algorithms (maximum entropy and boosted regression trees) integrating climatic, socioeconomic, ecological and transportation data into niche models.24 Qualitative models such as analytical hierarchy process were also applied synthesising scientific literature to create RVF risk maps.25 Overall, aspects frequently predicted were outbreak timing and location, environmental transmission risk/susceptibility, outbreak hotspots, predictability of outbreaks, future outbreak risk and risk factors. Data types most often integrated were outbreak and case count, climatic, ecological and vector/host data. Risk mapping was frequently used for diseases such as RVF and Zika. Models of this domain were often proposed or even implemented as early warning systems based on climatic anomaly surveillance.

Statistical analysis: time series and regression models

Statistical analysis in prediction and forecasting most often used time series (40%) and regression analysis (36%). Most often, models input past case count and risk factors and output a numeric value representing prospective case count. Analytical tools such as ARIMA, Generalized Additive Mixed and Markov switching models were used in predictions relating seasonality to incidence. For example, one approach coupled 12 years of dengue census data as Zika case surrogate data onto climate and demographic colayers in order to estimate future Zika incidence.26 CCHF and Zika predictions often applied the aforementioned methodologies. Time series of climatic risk factors, coupled to incidence, were also used to forecast future outbreak location and timing, case count and epidemic dynamics. For example, a distributed lag non-linear model was used to associate meteorological factors to outbreaks and predict outbreaks with a time lag of 20 weeks.27

Quantitative outbreak models

Quantitative risk models (16%) were conducted through a variety of methodologies. A probabilistic model based on worldwide incidence, transportation data, probability of arrival of infected travellers and entomological field data was used to estimate future outbreak likelihood in a large European city.28 A SARS metapopulation model assessed worldwide transportation networks to establish a global, between-country quantitative outbreak likelihood scale.29 Another SARS study coupled transportation data onto a Susceptible-Exposed-Infectious-Recovered (SEIR) framework to estimate future incidence.30 A Zika model applied an ecological study design, using socioeconomic data as a surrogate for unprotected sex, to establish locations of future outbreaks should a Zika introduction event occur.31 Quantitative aspects that were predicted included risk of outbreak, total population at risk of disease and projected epidemic size.

Qualitative outbreak models

A few articles evaluated future disease outbreaks through various qualitative model types (12%). One article employed a qualitative risk assessment using Delphi technique to elicit expert opinion as to the EU outbreak risk of CCHF and RVF.32 A study in Saudi Arabia analysed serological data to assume immunity level, exposure risk and descriptively infer RVF outbreak risk.33 Another article used field epidemiological methods (carcass detection) to set up an ‘Outbreak Alarm Network’ to warn of impending outbreaks. Specifically, Ebola positivity of simian carcasses was communicated to local health centres, predicting outbreaks and improving preparedness.34 Another article evaluated binary Lassa fever risk to populations through a machine learning model by weighing different predictor variables.35 In this category, a frequent data type employed was expert opinion. Furthermore, ground level epidemiological data (carcasses and host immunity) was also used to understand the environmental susceptibility to outbreaks and thus estimate future outbreak risk.

Computing and internet systems

A relatively small number of articles (10%) proposed computer or phone-based internet systems designed to alert application users to new outbreaks and signal health authorities to case build-ups pre outbreak. All models were applied to Zika disease except one, which was applied to MERS. Data integrated into phone applications consisted of user autoreported personal health data (eg, symptoms), patient data from health institutions and internet-based geopositional information (eg, location of nearby mosquito breeding sites, infected persons or risk factors).36 The system would then communicate directly to the user/patient and the healthcare actors the outbreak risks or the patient’s own likely infection status. Other early warning systems used Google trend search term time series as surrogates for epidemiological data and coupled this to hospital and public health records in order to predict outbreak location and time.37 App-based models were based on theoretical exercises and synthetic data, while Google trend studies mostly used real data. Aspects predicted often included infected user location and movement in real time, and case numbers.

Prediction validation and implementation

Forty-one per cent of articles validated their predictions against real data. Seventy-four per cent of articles cited challenges or limitations to their studies. Only 10% of prediction methods or forecasts were implemented in real public health or outbreak settings by decision makers or policymakers.


This scoping review aimed to map the available evidence concerning the prediction of future outbreaks of the Blueprint diseases. The most frequent prediction method identified was spatiotemporal modelling and risk mapping. Furthermore, most models predicted spatial or temporal aspects of future outbreaks. Most frequently, outbreak risk in a specific time or location was predicted, qualitatively or quantitatively. While multiple models predicted future outbreak locations (eg, ‘a RVF outbreak is predicted in central Sudan’) and expected case numbers (eg, ‘12.3 million Zika cases could be expected’) with relatively high granularity, few articles besides RVF studies made temporally precise predictions (eg, ‘we predict an outbreak from June to July of 2021’).26 Specifically in the case of RVF models, habitat flooding and vector niche colayers enabled precise spatiotemporal predictions of outbreaks with time lags months in advance.38

RVF and Zika were the most studied diseases, presumably in part due to their strong reliance on a set of predictable and measurable climatic and vector/host factors.39 Zika’s recent multicontinental impact may have led to its elevated level of research.40 Comparatively fewer articles concerning Ebola were published on outbreak prediction. However, during screening a significant number of articles were excluded, which modelled current Ebola epidemics.

Spatiotemporal modelling and climatic predictions

Risk mapping was the most widely used method of predicting and forecasting future outbreaks. Many RVF studies used the well-studied relationship between El Niño/Southern Oscillation (ENSO) phenomena, rainfall and cyclical patterns of outbreaks.39 ENSO phenomena refers to the coupling of increased sea surface temperature (SST) and specific wind and rain patterns in the central and eastern tropical Pacific.41 In East Africa, ENSO results in above average rainfall, in turn flooding dambos of the RVF host Aedes mosquito needed for RVF outbreaks. SST and satellite-measured vegetation index (normalised difference vegetation index) are then used as surrogate variables to monitor RVF outbreak conditions.42 Many articles were able to make predictions based on this relationship together with historical outbreak and entomological niche data. This enabled the production of within-country, region-specific risk maps predicting outbreaks in East Africa. These maps guided the implementation of public health measures 2–4 months in advance of outbreaks.38 Furthermore, climate-based RVF risk maps were either incorporated into or proposed as early warning systems.38 43–45 For example, the US armed forces established a multidisciplinary early warning system that reduced the economic and health consequences of the 2006–2007 Eastern African RVF outbreak, compared with the 1997 RVF outbreak.46 We produced a case study illustrating the general RVF prediction methodology (figure 2).

Figure 2

Example case study of Rift Valley fever (RVF) outbreak prediction. Illustration adapted from prediction strategies devised by Anyamba et al43 : (1) advanced very high resolution radiometers (AVHRR) on satellites measure observations of various global to subregional variables; (2) outgoing longwave radiation (OLR), sea surface temperature (SST), normalised difference vegetation index (NDVI) and rainfall together with coordinates of previous outbreaks are integrated into outbreak risk maps; (3) risk map predictions are associated to persistent anomalies in NDVI over specific locations, for example, predicting RVF outbreaks during future time periods and enabling warnings with time lags weeks to months ahead. Warnings are transmitted as part of an early warning system to different agencies (4), which lead pre-emptive measures: information to private citizens and health personnel, vaccination drives, awareness campaigns and vector control through pesticides.

Complex risk mapping for the Blueprint diseases

Risk maps were also used to predict diseases with more complex and/or poorly understood outbreak risk factors such as Ebola and Marburg disease, Lassa and CCHF.24 47–49 A multistage outbreak assessment of Ebola integrated broad data types such as epidemiological, vector, expert opinion, demographic, transportation and geopolitical data.50 Pandemic potential from the community to international level could thus be assessed. Outbreak risk was further divided into index case potential, outbreak potential and epidemic potential. This multilayered analysis produced an outbreak receptivity risk map for the entire African continent up to the subregional level, permitting a thorough understanding of the potential of various factors on future outbreak likelihood.50 This type of analysis may be useful in providing actionable information on very precise environments for policymakers. Predictions were also made for less well-studied diseases such as Nipah. Previous models had been limited by a paucity of spatial information.51 While an article used spatial occurrence data from Bangladesh to overcome this issue, limited application of human culture and ecological variables reduced aetiological understanding of this zoonosis.51 Other risk maps integrated socioeconomic, infrastructural and economic data, identifying societal risk factors and vulnerabilities that could theoretically be addressed pre-emptively by public health actors. Taylor et al.52 identified social vulnerabilities (eg, income, disease knowledge and phone access) to RVF in East African communities, enabling the production of vulnerability maps. However, in such studies, high model uncertainty remained regarding the integration of social vulnerability parameters in predictions. Lastly, models often mixed stochastic and deterministic models and frequently used machine learning. Models such as random forest algorithms or back propagation neural networks were employed due to their ability to engage highly complex data sets.23 53–56 For example, a gradient tree boosting model integrated transportation, economic, demographic, ecological, case and vector data to create a Zika risk map estimating precise outbreak probability in the Asia Pacific region between 2016 and 2017.57

Regression analysis and time series forecasts for CCHF and Lassa fever

The most used statistical outbreak prediction methods were regression analysis and time series forecasts, often predicting future case count. A time-trend model enabled the prediction of Lassa fever cases, a disease with a paucity of information, 5 years in advance.48 Solar radiation analyses permitted RVF outbreak predictions 5 years in advance.58 Time series predictions employed internet analysis (Google trend) in spatially defined regions as surrogate case data,59 or directly obtained case data from governmental health institutions.60 In the case of CCHF, case count may be the most readily accessible and reliable data as model input and may explain the application of time series analysis. However, a more layered approach for predicting spatially defined incidence and outbreaks is coupling occurrence time series to risk factors. The first prospective CCHF case prediction tool employed machine learning, 50 spatiotemporal covariates and 14 years of occurrence data, thus facilitating prediction of resources needed (vaccines, ventilators and Intensive Care Unit beds) and enabling preparedness (pesticides and reducing livestock import).61

Quantitative and probabilistic SARS and Nipah virus predictions

Certain quantitative models enabled multilayered probabilistic approaches to prediction. A mechanistic risk assessment framework gauged Nipah risk and predicted it to be hundreds of years before the introduction of Nipah into the European Union.62 This prediction was based on socioeconomic and ecological zoonotic drivers including human travel, trade, live animal movements and illegal bushmeat importation. While the model offered lower spatiotemporal granularity than weather-based risk mapping, such articles enable an analysis of contemporary causes frequently cited as driving zoonotic surges (eg, bushmeat trade).63 Throughout the review, sociopolitical factors such as political instability and social vulnerability were sparsely integrated into models. A single model integrated political instability, conflict and outbreak receptivity.50 Transportation was another important data source in prediction. A probabilistic SEIR SARS model evaluated the qualitative and quantitative global risk for spread and predicted infection cases through global transportation networks.30 Case and transportation data enabled authors to predict infected countries on an almost one-to-one basis compared with future case data, and vaccination threshold on epidemic spreading.30

Internet-based predictions of Zika and MERS

Internet-based systems represented a small yet promising domain of outbreak prediction. Analysed studies exclusively researched Zika and MERS. Methods included early warning systems integrating cloud computing and phone application-based risk mapping.36 37 56 Other models used Google search term forecasting.59 64 65 Cloud computing systems used machine learning to integrate demographic, user and geolocalised internet data to establish a real-time predictive infection and outbreak mapping system. This enabled individualised, context-specific, real-time feedback to application users warning them of imminent outbreak risks such as nearby infected users.56 Further forecasting systems coupled Google trend search queries to real-time epidemiological data to enable adequate predictions of outbreak and epidemic onset. Google trend predictions were also easily verifiable through outbreak data weeks later. However, Google search methodology was often limited by its dependence on only two variables, case count and Google trends.64 Phone and computer systems were limited by their proof of concept status, as they used synthetic data sets.

Narrative and qualitative outbreak prediction

Lastly, qualitative prediction models used a wide breadth of methodologies. An article predicted RVF outbreak dynamics (onset, plateau and seasonality) by assessing regional outbreak susceptibility through livestock seropositivity and host immunity. Coupled with local climatic and agricultural data, the authors produced a narrative assessment and prediction of a future RVF outbreak in Saudi Arabia, should an introduction event occur.33 Another qualitative model established a surveillance network monitoring animal mortality to detect animal Ebola outbreaks and thus predict and prevent human outbreaks.34 Specifically, local hunters and epidemiologists identified Ebola positive gorilla and simian carcasses and referred their observations to local healthcare actors. On two occasions, this network was able to warn the authorities in the Republic of Congo and Gabon of an imminent risk for human outbreaks.34 Other authors evaluated the RVF-related knowledge level of locals at risk of RVF contact, cattle farmers, to inform model risk maps and in return produce tailored awareness programmes for such persons.45

Stochasticity and determinism in predictions

60% of models reviewed were stochastic in nature. Stochasticity allows for uncertainty in modelling while respecting the inherent randomness of inferred underlying disease processes. For example, in stochastic models, it is possible for outbreaks to die out even if R >1, which is not the case in deterministic modelling.66 However, stochastic processes may not always be ideal for future predictions, as the underlying disease processes may change in future environments. In contrast, deterministic models tend to remain accurate in future environments, as changes to host–pathogen dynamics or disease processes are more easily adapted into the model. Regardless, deterministic models pose a significant challenge in terms of complexity, especially for diseases studied herein that lack a well-established body of literature. This, and the relative simplicity of stochastic models, may explain part of the reliance on stochastic models.

Convergence and divergence in data sources

In general, data tended to diverge rather than converge on common sources. Foremost, and inherent to the research question, we compared 58 prediction models across nine different diseases on a global scale over a timeperiod of 20 years. Different diseases and different analysed environments resulted in diverging data sources. Second, heterogeneous study methods yielded different requirements in terms of data needed. Third, models often integrated authors’ own assumptions. Lastly, multiple data sources were literature based and these sources have varied and changed over the 20-year time period studied. Convergence in data sources, when seen, was most often for climate data through the use of NASA and National Oceanic and Atmospheric Administration satellites or climate databases such as WorldClim. The listed data sources by disease can be found in online supplemental material.

Avenues in SARS-CoV-2 outbreak prediction

When taking our results into consideration, this review delineates a variety of avenues worthy of exploration in the prediction of SARS-CoV-2 outbreaks. Compartmental SEIR and time series models may currently be the most readily available models to make predictions for new COVID-19 outbreaks. However, such models must be continuously adapted as fluid governmental measures (eg, lockdowns and changing social norms) upend the underlying assumptions on which many of these models are based. Increasingly, machine learning algorithms such as Bayesian inference can help in evaluating large data sets with fluctuating underlying assumptions and high uncertainties in data value, as is the case with the current influx of COVID-19 datasets.17 Certain literature shows promise in combining SEIR/SIR models with machine learning algorithms.67 Climate-based predictions, while rewarding for RVF, may be less applicable in predicting SARS-CoV-2. The virus is endemic, and the principal vectors are humans. Models would thus be applied on evaluating human vectors, possibly through behavioural and transportation data. Behavioural studies, such as evaluating MERS risk during the Hajj pilgrimage, and transportation analyses for SARS, may thus be of interest in the context of the current pandemic.29 30 Furthermore, as current predictions are for the most part on the regional to national scale and context specific, agent-based models could be useful in integrating these transportation and behavioural dynamics in the context of a confined local, regional or even national scale. No articles in this review applied agent-based models in predictions (online supplemental material 1). While various model types are emerging in the prediction of COVID-19, there is no consensus on optimal models for COVID-19 prediction, and further research is needed towards predicting COVID-19 outbreaks.

Prediction validation

While 41% of articles validated their predictions against real data, there was significant heterogeneity in the demonstration of outbreak prediction validation. Certain articles graphically illustrated their predictions compared with real data, others purely stated their data had matched the predicted outbreak. Furthermore, many articles focused on whether their predicted outbreaks occurred, rather than outbreaks their models had missed. However, certain articles made easily verifiable outbreak predictions. For example, multiple articles predicted the RVF outbreak in 2006–2007.38 43 Of note, a single article predicted the Ebola outbreak in 2014 by predicting a peak of infectious bats in the region where the 2014 West African Ebola outbreak occured.49 Finally, 59% of articles did not validate their predictions against future data, often publishing before the timeline of predicted outbreaks. While certainly an avenue for further research, evaluating the accuracy of outbreak predictions exceeded the scope of this review.

Limitations in predictions

Limitations in the articles, when discussed, referenced the inherent complexity of epidemiological processes and the necessary simplification of their models. Another challenge was incomplete, unreliable or scarce data and patchy surveillance networks. This may have caused pseudoabsence scenarios in modelling where locations without recorded outbreaks may still have had instances of transmission occur. Certain articles only used synthetic data to make predictions, hindering validation.

Challenges in reviewing

This review also had methodological challenges and limitations. Separating articles between modelling of current epidemics on the one hand and prediction of future outbreaks on the other hand was a demanding screening challenge requiring thorough full-text assessments. Furthermore, a semantic challenge arose when certain studies equated transmission risk to outbreak risk, while others made a distinction of transmission risk being only a part of future outbreak risk. We thus excluded articles that merely evaluated for transmission risk or spillover risk defined as transmission of disease from one individual to another. Lastly, only the term Disease X was employed for an as-of-yet undiscovered pathogen in order to evaluate the Blueprint list’s impact on research. This may have reduced the number of studies identified predicting outbreaks of undiscovered pathogens.


While there is ample research on modelling existing epidemics, the current review shows a significant literature gap in prediction and forecasting of future outbreaks of the Blueprint priority diseases. Only few articles attempt true spatiotemporal prediction. The most common scenario, RVF risk mapping through vector, occurence and climate data, appears to be precise geographically and temporally and has a track record of enabling prospective interventions that mitigated outbreaks. Even so, the warnings were presented when conditions were already rife for RVF outbreaks.68 Thus, there may be room for improvement of early warning systems by including local health actors (eg, farmers, forest rangers and hunters) as frontline epidemiological personnel to warn of future outbreaks and implement public health strategies.34 68 Furthermore, only few articles sought to predict outbreaks through common zoonotic drivers such as ecological destruction, wildlife trade, conflict or political data and measures of social vulnerability. A significant research gap also concerned the integration of indicators of health system capacity, government effectiveness or health emergency preparedness into models. International Health Regulations factors such as State Party Self-assessment Annual Reports, Joint External Evaluations and Global Health Security Index measures were not integrated into the models reviewed. However, these indicators are an imperfect assessment of health capacities, and there is ongoing debate as how to best measure pandemic preparedness, which may explain the paucity of integration of such data in the analysed models.69

This article attempted to displace the focus from outbreak risk factors to prediction, as prediction integrates risk factors into actionable information. While fraught with inherent uncertainty and stochasticity, prediction may be a pragmatic public health and research avenue that warrants support. Prediction of spatiotemporal and epidemiological qualities of future outbreaks enables public health actors to pre-emptively act on explicit spatiotemporal information. As such, a major challenge remains in foreseeing novel zoonoses. No articles studied Disease X outbreak prediction. Ebola, SARS, Zika and COVID-19 were all Disease X, before their initial outbreaks. While the Blueprint list includes SARS and MERS, no articles in this review predicted an outbreak of COVID-19 in 2019. This demonstrates that efforts to predict the future Disease X remain a major gap in the literature.

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information.

Ethics statements

Patient consent for publication


The main author would like to thank Alexia Trombert and Cécile Jaques at the Lausanne University Hospital Library for their help with the database searches, Sarah Nicollier, Alexander Jucht and Josh Reisler with proofreading and Amaury Thiabaud for his help in reviewing and selecting articles.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Handling editor Seye Abimbola

  • Twitter @FLAHAULT

  • Contributors NJ contributed to the study design, analysed the data and wrote the manuscript. AF initiated the project, contributed to the study design, discussed the results and revised the manuscript. VD'A discussed the results, advised on study content and data presentation and revised the manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.