Epidemics are influenced by both disease and societal factors and can grow exponentially over short time periods. Epidemic risk analysis can help in rapidly predicting potentially serious outcomes and flagging the need for rapid response. We developed a multifactorial risk analysis tool ‘EpiRisk’ to provide rapid insight into the potential severity of emerging epidemics by combining disease-related parameters and country-related risk parameters. An initial set of 18 disease and country-related risk parameters was reduced to 14 following qualitative discussions and the removal of highly correlated parameters by a correlation and clustering analysis. Of the remaining parameters, three risk levels were assigned ranging from low (1) moderate (2) and high (3). The total risk score for an outbreak of a given disease in a particular country is calculated by summing these 14 risk scores, and this sum is subsequently classified into one of four risk categories: low risk (<21), moderate risk (21–29), high risk (30–37) and extreme risk (>37). Total risk scores were calculated for nine retrospective outbreaks demonstrating an association with the actual impact of those outbreaks. We also evaluated to what extent the risk scores correlate with the number of cases and deaths in 61 additional outbreaks between 2002 and 2018, demonstrating positive associations with outbreak severity as measured by the number of deaths. Using EpiRisk, timely intervention can be implemented by predicting the risk of emerging outbreaks in real time, which may help government and public health professionals prevent catastrophic epidemic outcomes.
- public health
This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
What is already known?
There has been an increase in the number of serious infectious disease outbreaks worldwide in recent years.
Rapid response time is critical for epidemic-prone diseases, and short delays to outbreak response result in preventable morbidity and mortality.
Risk analysis can be used to prioritise a response to an epidemic by examining the potential impact of the infectious disease outbreak.
What are the new findings?
EpiRisk provides rapid risk prediction of outbreaks that will assist decision makers in epidemic management.
Both pathogen and country parameters have a significant impact on the risk of infectious disease outbreaks in particular countries or regions.
EpiRisk provides an individualised approach where specific input data for a country and disease can be used, rather than a standard ‘one size fits all’ approach that would be less generalisable.
What do the new findings imply?
The development of a simple risk analysis tool will be useful for global epidemic control. This tool can be used to rapidly predict the risk of outbreaks, which is useful when planning and prioritising interventions or for epidemic preparedness.
An appropriate and timely intervention can help governments and public health professionals prevent catastrophic outcomes.
Since 2000, there has been an escalation in the number of serious infectious disease outbreaks worldwide.1 2 For example, the 2002–2003 severe acute respiratory syndrome coronavirus (SARS-CoV) epidemic spread to 37 countries, resulting in over 8000 cases and more than 700 deaths.3 Five years later in 2009, a new influenza subtype, A/H1N1pdm09 emerged in North America before circulating around the world affecting an estimated 24% of the global population.4 5 The 2013–2016 Ebola outbreak in West Africa was an unprecedented catastrophe, and Ebola continues to pose a global threat.6 Zika virus spread from Brazil and infected an estimated 1.3 million people, resulting in upwards of 4000 cases of microcephaly among infants.7 8 Outbreaks of acute flaccid myelitis have also occurred across the USA, Europe and Asia, associated with EV-D68.9 There has also been an increase in the number of vaccine-preventable disease outbreaks in recent years including polio,10 measles11 and diphtheria12 .
Lessons learnt from recent serious epidemics show that control measures may not be sufficient or timely enough,13 and even short delays to outbreak response result in preventable morbidity and mortality.3 14 15 The response to the 2013–2014 West African Ebola outbreak is an example of a delayed intervention. Traditional laboratory and surveillance systems misdiagnosed Ebola victims as Lassa fever and cholera.15 The first confirmed laboratory case detected by scientists in France occurred approximately 2.5 months after illness in the index case, and intervention by WHO was delayed for at least 5 months after that.15 16 This failure in the early stages of the outbreak had significant follow-on effects, contributing substantially to an exponential increase in epidemic size to over 28 000 cases and 11 000 deaths.6 In 2015, 21 polio-free countries reported re-emergence of the virus with a total of over 1000 confirmed cases,10 with a slow response implicated as a factor in the impact of these outbreaks.17 Rapid response time is critical for epidemic-prone diseases, and decision support tools to prompt rapid response may be useful.
When looking at global patterns, there is diversity in the severity and impact of outbreaks,18 which means in the early stages, tools that can predict risk may help identify those with catastrophic potential. Both pathogen and country parameters have a significant impact on the risk of infectious disease outbreaks in a particular country or region.19 For example, innate characteristics of the pathogen such as the type of pathogen, transmission mode, basic reproductive number, case fatality rate, as well as the availability of effective therapy and vaccine for that particular disease all have a role in predicting the risk of an epidemic.20 Country factors including the social, economic and cultural characteristics of a country may also contribute to the risk of an outbreak. Risk assessment frameworks exist for polio, measles and dengue, which are used in endemic countries such as the Philippines and Romania.21–23 The availability, however, of a simple, universal epidemic risk scoring framework is valuable to predict the risk of an infectious disease outbreak and to prioritise response.
There are limited studies assessing the risk of infectious disease outbreaks based on country features. In 2017, Ajisegiri et al19 used such an approach to predict the risk and outcome of the Ebola epidemic worldwide by considering country-specific parameters based on socioeconomic, health system, cultural and geographical factors. The risk framework by Ajisegiri successfully stratified the risk of Ebola outbreaks by country and predicted a much higher risk in West Africa compared with the USA or UK. Had such a tool been available in March 2014, it may have helped prompt a more rapid response. This study aimed to develop a generalised epidemic risk analysis framework that incorporates both pathogen and country-specific parameters to classify the risk of epidemics and provides an early warning system for the management of global outbreaks.
We developed a risk analysis framework to predict the risk of an epidemic of any cause modifying the approach used in the Ebola-specific tool developed by Ajisegiri et al. In 2018, Ajisegiri et al19 identified the role of sociodemographic features of the Ebola-affected countries to the magnitude of the outbreak. The authors developed a framework that assigned risk scores (from 1 to 3) to country-specific characteristics such as socioeconomic, health systems and geographical factors, as well as cultural beliefs and traditional practices. These risk scores were then added into a simple summation model to produce a single risk figure for a given country. Our modified approach differs from that by Ajisegiri et al in two key ways. First, we expanded the scope of the framework developed by Ajisegiri et al19 to be suitable for almost any outbreak in any country (pending data availability) by including both disease-specific and country-specific parameters. Second, we used a subset of the country parameters used by Ajisegiri et al. A key objective of the framework presented in this paper is to allow for a quick and simple evaluation of outbreak severity risk in countries around the world. Some factors used by Ajisegiri et al require data that are not easily or quickly available (eg, screening at borders and bush meat consumption), and while those factors were suitable for evaluating a local region such as that in the Ebola outbreak, it does not meet the needs of our global framework. Data were collected on outbreak parameters that fall into two different categories: disease-related parameters and country-related parameters, as outlined in table 1.
Selection of initial risk parameters and data collection
We searched the literature to identify common risk factors of infectious disease outbreaks using three online databases: PubMed/Medline, Scopus and CINAHL using keywords ‘outbreak* OR epidemic* OR pandemic* OR emerging disease* OR re-emerged disease*’. We identified numerous initial factors associated with the severity of the disease outbreaks,24–27 which we categorised as either disease-related parameters or country-related parameters. Additional disease-related parameters were selected based on a modified Disease Attribute Intelligence System risk assessment tool by the Institute of Environmental Science and Research Limited,20 while additional country-related parameters were identified based on the risk analysis framework tool by Ajisegiri et al.19 The financial capability and the level of investment of a country plays a significant role in the early response to outbreaks.28 Sociopolitical factors also contribute to the length and outcome of outbreaks. For example, war and conflict provide ideal conditions for outbreaks of infection diseases. In conflict areas, health professionals flee, infrastructure is destroyed and the supply of medical equipment is halted.29 In some instances, hampering of immunisation efforts has also contributed to the spread of vaccine-preventable diseases,30 making a country more vulnerable to outbreaks. The majority of severe disease outbreaks also originate from densely populated regions.31 Overcrowding coupled with lower living standards can lead to the efficient transmission of diseases and positively correlate with the risk of an outbreak.32 The existing health system within the country also plays a crucial role in the risk of outbreaks. An epidemic can spread wider and faster in a country with a weak health system.33
The infectiousness of pathogens also contribute to the risk level of an outbreak.34 Diseases caused by viral pathogens are likely to transmit more rapidly due to the nature of viral replication and mode of transmission in contrast to many bacteria;35 pathogens such as measles are more infectious partly because they spread by the respiratory route and have a high R0.36 Cholera, although spread via the faecal–oral route, is not as rapid as airborne transmission; however, it has a high reproductive number.36 During the South Sudan epidemic in 2016 cholera spread rapidly, resulting in over 20 000 cases and 436 deaths. Besides infectiousness, the availability of control measures for prevention and treatment also has an impact on the overall risk of an outbreak. These factors make it easier to prevent and control epidemics. Vaccination reduces the risk of infections by reducing the number of susceptible people in the population.37 Treatment availability affects the risk of disease outbreak by reducing the duration, infectious period and severity of disease.38
Initially, 18 parameters were selected to calculate overall epidemic risk (table 1). Explanations of each parameter can also be seen in table 1. Sources of data for country-related parameters included the World Bank, WHO Global Health Observatory and the Peace Institute. For disease-related parameters, a mixture of grey literature and official sources including Centers for Disease Control and Prevention and WHO were used.
The preparation of country-specific data involved collection of both current and historical data points for 198 countries from the World Bank database. The historical data were required to allow for a valid evaluation of the tool on historical outbreaks. That is, we used country-specific data points (eg, gross domestic product and physicians’ numbers) that were relevant at the time of the outbreaks used for evaluation.
As part of the collection of pathogen-specific data, we performed a review of three outbreak databases: ProMED-mail,39 Healthmap40 and EpiWatch41 to collect a line-list of diseases from recent outbreak events for evaluation of the tool.
Criteria for allocating risk scores
We applied values ranging from one to three for each parameter to indicate the level of risk, where ‘low risk’ was denoted by ‘1’, ‘moderate risk’ was denoted by ‘2’ and ‘high risk’ was denoted by ‘3’. We applied minimum and maximum values, score 1 as ‘yes’ and 3 as ‘no’ for binary parameters that had only two different risk groups, for example, asymptomatic transmission, vaccine availability and drug availability. Criteria for assigning risk scores to selected parameters were used from the relevant studies.19 20 The values applied were based on specific risk factor criteria and are detailed in table 1. Where there was no available data for a parameter (N/A=not available), the highest risk score was assigned.
Selection of final parameters
The initial set of 18 parameters were reduced to 14 as a result of qualitative discussions and a correlation and clustering analysis on the country parameters. The country parameter ‘roadways/transport network’ was removed due to conflicting interpretations of risk i.e. (1) it may indicate better access to healthcare services, or (2) increase contact between people and hence risk of infection. The disease-factor ‘Disease identification method’ was removed as it provided little variance with most diseases requiring further investigation for confirmatory diagnosis. The ‘reservoir’ factor was removed as its mapping to risk is not consistent and straightforward.
After the qualitative determination to remove the three aforementioned factors, leaving 15 factors (eight county, seven disease), we performed a quantitative analysis to compute the correlations between all remaining country parameters for the 198 countries in our dataset. As some of the country factors (see table 1) consist of ordinal level data, we used the Kendall correlation coefficient to compute the correlations. See figure 1 for the results between the eight country factors.
The correlation results demonstrated that income, hospital bed density, physician density, and nurses and midwife density were highly correlated factors. We decided that income and hospital bed density were both useful information to show in the framework, and we include them both, despite the Kendall correlation of 0.53. This will lead to a bias towards this in the model, which we address in our conclusion. Regarding the physician density and nurse/midwife density factors, their correlation is 0.66, and we decided they represent similar information about the healthcare system of a country; we therefore chose to retain physician density as a factor over nurse and midwife density, because of physician diagnostic and therapy skill and due to slightly more limited historical availability of nurse and midwife density data for some countries. The initial 18 parameters were reduced to 14 after the reduction process described above (the remaining factors are bold in table 1).
In our framework, an outbreak in a given country is given risk scores in each of the seven disease factors and seven country factors. These scores are then summed to produce a disease score and a country score for the outbreak, which are then in turn summed to produce a final risk score. The minimum possible risk score is 14 (all 1s) and the maximum is 42 (all 3s). The total score was classified into four risk categories to rank the priority. The categories are ‘low risk’ for a score less than 21, ‘moderate risk’ for a score between 21 and 29, ‘high risk’ for score 30–37 and ‘Extreme risk’ for total score beyond 37. The cut-offs were determined based on expert knowledge of historical outbreaks.
The historical and current country data, together with the disease data, are stored in csv files and managed in the R statistical software package for analysis. The initial data collection was managed in Excel.
Patient and public involvement statement
Involvement of patients or public in this research was not applicable.
We evaluated our risk framework in two ways. First, to demonstrate the use of the framework, we applied it to a number of past outbreaks and showed that our computed risk scores provided insight into the severity of nine historic outbreaks. Second, we collected data on 61 different outbreaks between 2002 and 2018 and showed to what extent the risk scores computed in our model correlate with the number of cases and deaths in those 61 outbreaks.
First, to demonstrate the use of the framework, we selected nine different outbreaks with varied outcomes, from different countries with low, medium and high incomes from the last 5 years (2015–2019) in order to to cover a range of disease-related and country-related parameters. Table 2 presents these outbreaks together with the country and disease risk scores computed using our framework. The outbreaks consist of the following: hepatitis A in Australia (2018), Ebola in the USA (2018), measles in Japan (2018), diphtheria in Bangladesh (2017), Zika in Brazil (2015), hepatitis A in Italy (2013), Ebola in Sierra Leone (2014), cholera in South Sudan (2016) and Lassa fever in Nigeria (2018).
These nine historical outbreaks were divided into three groups based on similarity of duration, cases/deaths and international aid:
Group 1: hepatitis A outbreak in Australia, Ebola outbreak in the USA and the measles outbreak in Japan. These three epidemics lasted for less than 3 months, had only a few cases, few fatalities and received no international support.
Group 2: diphtheria in Bangladesh, Zika virus in Brazil and hepatitis A in Italy all shared similar characteristics: the number of cases were high, but the case–fatality rate was low. There were 218 931 people affected with Zika outbreak in Brazil with only 11 deaths. The outbreak duration of this group varied from 6 weeks to 18 months. Both Bangladesh and Brazil obtained international aid, while Italy managed the outbreak with their own resources.
Group 3: Ebola in Sierra Leone, cholera in South Sudan and Lassa fever in Nigeria include large epidemics with a long duration and high case fatality and morbidity rates. All group 3 outbreaks required international aid to control the outbreak.
To demonstrate the extent to which the model risk scores correlate with outbreak severity, we collected data on 61 different (non-endemic) outbreaks between 2002 and 2018 of 18 different pathogens in 43 different countries with a wide range of number of cases (mean 25 691, median 804, first quantile 100, third quantile 2734) and deaths (mean 347, median 14, first quantile 1, third quantile 76) per outbreak. Figure 2 shows the distribution of the total risk score computed for the 61 outbreaks in the dataset.
For our evaluation, we split the outbreak risk scores into four quantiles, and we split the number of deaths into four quantiles. That is, Q1 of risk scores represents the 15 outbreaks in our evaluation set with the lowest risk scores and Q4 represents the 15 outbreaks in our evaluation set with the highest risk scores. Similarly, Q1 of deaths represents the 15 outbreaks that resulted in the lowest number of deaths and Q4 represents the 15 outbreaks that resulted in the highest number of deaths. We then tabulated these two quantile groups against each other to evaluate to what extent outbreaks with the highest number of deaths were also the outbreaks with the highest risk scores in our framework and similarly for the lower quantiles. See table 3 in the results section for the result.
Table 1 presents the sourced parameters for each outbreak corresponding to the nine outbreaks listed in table 2 with the corresponding risk score and risk classification. The total risk scores in table 2 is the sum of the total country score and total disease score. Australia has the lowest country score, and the overall risk score for Australia is the lowest. Likewise, Japan had a low country score. In contrast, Nigeria had the highest total country score, while Bangladesh also had a high country score; however, diphtheria’s diseases-related score is the lowest among all diseases tested, so the overall risk score is not high enough to characterise the outbreak as extreme risk but was characterised as high-risk overall.
For our quantitative evaluation of risk scores computed on an evaluation dataset of 61 outbreaks, we first split the outbreaks into quantiles. Each of the 61 outbreaks was placed in one of four quantiles for the number of deaths associated with the outbreak. An outbreak in Q1 was an outbreak where the number of deaths was in the lowest 25% of the full dataset of 61 outbreaks. An outbreak in Q4 for the number of deaths has a number of deaths associated with it that is within the highest 25% of the full dataset. The same method is used to place the outbreaks into four quantiles of risk scores as computed by our risk framework. Table 3 shows the result. What is promising is that of the 16 outbreaks that resulted in the highest number of deaths (Q4), 12 were in the top 2 quantiles in terms of risk score (Q3/Q4). However, the other four were in the lower two quantiles of risk score. Of the 15 outbreaks in our dataset that resulted in the least number of deaths (Q1), 11 were in the bottom two quantiles in terms of risk (Q1/Q2) and four were in the top quantiles of risk (Q3/Q4). An equally promising pattern is shown for outbreaks in Q2 and Q3.
There is a clear association between the risk scores computed by our framework and the severity of outbreaks as measured by the number of deaths, but there are also outbreaks that fall outside of the risk categories one would wish to see. For example, one of the most severe outbreaks was in the bottom 25% of risk scores, and three of the least severe outbreaks were in the highest 25% of risk scores. Nonetheless, the overall association exists, and in future work, we intend to make the relationship stronger.
We found that risk calculations by EpiRisk correlated well with epidemic impact using our test dataset of historical epidemics. Both disease-related parameters and country-related parameters significantly contribute to the overall risk of epidemics. The EpiRisk tool is a simple and rapid risk scoring framework for global infectious disease outbreaks that could be used to prioritise rapid and effective epidemic response.42 Lessons from the severe acute respiratory syndrome (SARS) outbreak in early 2003 indicates that the timing of outbreak detection is a key factor in the success of interventions, and delays in outbreak detection can limit the efficacy of such interventions. Delays occurred in the 2013–2016 West African Ebola outbreak resulting in over 28 000 cases and over 11 000 deaths and had a substantial economic cost for the affected countries.43 The use of a risk scoring framework, if used early in the outbreaks, might have predicted the need for a rapid response and prevented a catastrophic outcome.19 This approach can be used if there are more than one epidemic within a country, particularly if resources are limited. An urgent and aggressive intervention is needed for a higher risk score. It includes aggressive case and contact identification, isolation and management and extreme social distancing nationally. Whereas a less aggressive intervention such as standard precaution and routine surveillance in the local level would be appropriate for a lower risk score. For example, an outbreak of Lassa fever occurring in Nigeria at the same time as other infectious diseases such as polio and cholera could also assist the government in prioritising urgent interventions to Lassa fever because it has higher risk disease and therefore minimise the overall impact. Such a tool may assist with prioritisation and early decision making. It may also help international organisations to prioritise countries in most need of urgent assistance.28
A limitation of this study is the dependence on the quality of the data inputs, such as the validity of the data sources, and the use of fixed disease parameters of known pathogens. The data used to calculate the risk scores were collected from various sources and from different points in time, which in some cases are out of date or contain biases. On an evaluation set of 61 outbreaks, our results show that the risk score computed using our framework has a positive association with outbreak severity as measured by number of deaths associated with an outbreak. In future work, we intend to improve the model to make this association stronger. At the moment, our model is a simple linear summation of risk factors. We intend to develop a larger dataset of historical outbreaks, at which point it will become viable to develop individual weights for the factors in the model that we hope will improve the strength of the association between risk score and outbreak severity. It should also be noted that the risk categories (and their cutoffs) do lead to a situation where a unit difference in risk can change the risk category entirely. The framework is also currently limited to known pathogens. This limits the tools application during the early moments of an outbreak where the pathogen maybe unknown and awaiting diagnostic confirmation. This also includes the emergence of novel pathogens, where many disease parameters required for input such as the basic reproductive number (R0), mode of transmission and case fatality rate are likely to be uncertain. However, in the case of COVID-19 as an example, as the pandemic progressed, these parameters quickly became known. Several of the high-scoring features outlined in table 1, such as asymptomatic transmission, respiratory spread, viral infection and high case fatality rate, were present.44 45 Future iterations of the tool could allow the input of these disease parameters manually, and various potential scenarios could be tested by varying the parameters individually, such are variations of R0 to account for uncertainty early in an epidemic. The framework is meant to be used as a whole, where the user ought to look at the risk score, the disease score, the country score and then the individual risk factor scores for deeper insight. Despite these limitations, EpiRisk provides an individualised approach where specific input data for a country and disease can be used, rather than a standard ‘one size fits all’ approach that would be even less generalisable. We believe this country-level approach is more rigorous and can predict outbreak risk more accurately than relying on disease or country factors in isolation.
The development of a simple risk analysis tool will be useful for global epidemic control. We have demonstrated in principle that EpiRisk can assess the level of epidemic risk for individual epidemics and performed well when tested against an initial set of real-world epidemics. This tool can be used to rapidly predict the risk of outbreaks that is useful when planning and prioritising interventions or for epidemic preparedness. An appropriate and timely intervention can help governments and public health professionals prevent catastrophic outcomes. To improve the relevancy for current or future prediction, the development of a real-time tool that provides the most current data for risk prediction is crucial. The involvement of local government and other health organisation to improve the data is also important. In future work, we will evaluate the generalisability of EpiRisk through the use of a more comprehensive test set of outbreaks.
Handling editor Seye Abimbola
Contributors DASL: data analysis, interpretation, study design, drafting and revision of the manuscript; PV: data analysis, interpretation, study design and revision of the manuscript; AM and DCA: interpretation, drafting and revision of the manuscript; CRM: conception of the study, interpretation, study design and revision of the manuscript.
Funding This work was supported by a grant from the National Health and Medical Research Council (NHRMC) Centre for Research Excellence in Integrated Systems for Epidemic Response (grant number 1107393). CRM is supported by a NHMRC Principal Research Fellowship, grant number 1137582.
Competing interests CRM has received funding for investigator-driven research from Sequris and Sanofi unrelated to this study. CRM has also been on advisory boards for the same companies.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data collected for the purpose of this study are publicly available. The EpiRisk Tool itself will be made available via an online system in the future.