On the predictability of infectious disease outbreaks

Scarpino, Samuel V.; Petri, Giovanni

doi:10.1038/s41467-019-08616-0

Download PDF

Article
Open access
Published: 22 February 2019

On the predictability of infectious disease outbreaks

Nature Communications volume 10, Article number: 898 (2019) Cite this article

21k Accesses
131 Citations
190 Altmetric
Metrics details

Subjects

Abstract

Infectious disease outbreaks recapitulate biology: they emerge from the multi-level interaction of hosts, pathogens, and environment. Therefore, outbreak forecasting requires an integrative approach to modeling. While specific components of outbreaks are predictable, it remains unclear whether fundamental limits to outbreak prediction exist. Here, adopting permutation entropy as a model independent measure of predictability, we study the predictability of a diverse collection of outbreaks and identify a fundamental entropy barrier for disease time series forecasting. However, this barrier is often beyond the time scale of single outbreaks, implying prediction is likely to succeed. We show that forecast horizons vary by disease and that both shifting model structures and social network heterogeneity are likely mechanisms for differences in predictability. Our results highlight the importance of embracing dynamic modeling approaches, suggest challenges for performing model selection across long time series, and may relate more broadly to the predictability of complex adaptive systems.

Long COVID: major findings, mechanisms and recommendations

Article 13 January 2023

Hannah E. Davis, Lisa McCorkell, … Eric J. Topol

Principal component analysis

Article 22 December 2022

Michael Greenacre, Patrick J. F. Groenen, … Elena Tuzhilina

High-throughput prediction of protein conformational distributions with subsampled AlphaFold2

Article Open access 27 March 2024

Gabriel Monteiro da Silva, Jennifer Y. Cui, … Brenda M. Rubenstein

Introduction

“If we don't have a vaccine–yes, we are all going to get it¹.” This dire assessment by a Canadian nurse in 2003 reflected the global public health community’s worst fears about the ongoing severe acute respiratory syndrome (SARS) outbreak^2,3. These fears—for perhaps the first time in history—were partially derived from mathematical and computational models, which were developed in near real-time during the outbreak to forecast transmission risk^3,4. However, the predictions for SARS failed to match the data^3,5. Over the subsequent 15 years, the scientific community developed a rich understanding for how social contact networks, variation in health-care infrastructure, the spatial distribution of prior immunity, etc., drive complex patterns of disease transmission^{6,7,8,9,10,11}, and demonstrated that data-driven, dynamic, and or agent-based models can produce actionable forecasts^{12,13,14,15,16,17}. Additionally, studies have demonstrated that predicting different components of outbreaks—e.g., the expected number of cases, pace, and tempo of cases needing treatment, demand for prophylactic equipment, importation probability, etc.—is feasible^{3,13,18,19,20,21,22,23,24}. Despite these advances, an ongoing debate continues in the scientific community about both the need and our capacity to forecast outbreaks^25,26. What remains an open question is whether the existing barriers to forecasting stem from gaps in our mechanistic understanding of disease transmission and low-quality data or from fundamental limits to the predictability of complex, sociobiological systems, i.e. outbreaks^{4,6,7,27,28,29,30}.

In order to study the predictability of diseases in a comparative framework, which also permits stochasticity and model non-stationarity, we employ permutation entropy as a model-free measure of time-series predictability^31,32,33. This measure, i.e permutation entropy, is ideal because—in addition to being a model independent metric of predictability—recent work has demonstrated that it correlates strongly with known limits to forecasting in dynamical systems, e.g., models where we can measure Lyapunov stability^31,32,33 and can be transformed into an estimate of Kolmogorov-Sinai entropy³⁴. Additionally, recent studies by Pennekamp et al.³³ and Garland et al.³⁵ demonstrated that permutation entropy correlated strongly with forecast accuracy for ecological models and with anomalies in climatological data.

Studying the predictability of a diverse collection of historical outbreaks—including, chlamydia, dengue, gonorrhea, hepatitis A, influenza, measles, mumps, polio, and whooping cough—we identify a fundamental entropy barrier for infectious disease time-series forecasting. However, we find that for most diseases this barrier to prediction is often well beyond the timescale of single outbreaks, implying prediction is likely to succeed. We also find that the forecast horizon varies by disease and demonstrate that both shifting model structures and social network heterogeneity are the most likely mechanisms for the observed differences in predictability across contagions. Our results highlight the importance of moving beyond time-series forecasting, by embracing dynamic modeling approaches to prediction³⁶, and suggest challenges for performing model selection across long disease time series. We further anticipate that our findings will contribute to the rapidly growing field of epidemiological forecasting and may relate more broadly to the predictability of complex adaptive systems.

Results

Permutation entropy as the predictability of disease time series

Permutation entropy is conceptually similar to the well-known Shannon entropy³¹. However, instead of being based on the probability of observing a system in a particular state, it utilizes the frequency of discrete motifs, i.e symbols, associated with the growth, decay, and stasis of a time series. For example, in a binary time series the permutation entropy in two dimensions would count the frequency of the set of possible ordered pairs, {[01], [10]}, and the Shannon entropy, or uniformity, of this distribution is the permutation entropy. In higher dimensions, one can define an alphabet of symbols over all factorial combinations of orderings in a given dimension, e.g., {[0, 1, 2],[2, 1, 0],[1, 0, 2], etc.}, over which the permutation entropy will be defined. A time series that visits all the possible symbols with equal frequency will have maximal entropy and minimal predictability, and a time series that only samples a few of the possible symbols will instead have lower entropy and hence be more predictable.

More formally, for a given time series {x_t}_t=1,…,N indexed by positive integers, an embedding dimension d and a temporal delay τ, we consider the set of all sequences of value s of the type s = {x_t, x_t+τ,…, x_t+(d−1)τ}. To each s, we then associate the permutation π of order d that makes s totally ordered, that is \(\tilde s = \pi (s) = [x_{t_i}, \ldots ,x_{t_N}]\) such that \(x_{t_i} < x_{t_j}\,\forall t_i\, < t_j\), hence generating the symbolic alphabet. Ties in neighboring values, i.e. \(x_{t_i} = x_{t_j}\), were broken both by keeping them in their original order in the time series and/or by adding a small amount of noise, the method of tie-breaking did not affect the results, see ref. ³⁷ for more details on tie-breaking and permutation entropy. The permutation entropy of time-series {x_t} is then given by the Shannon entropy on the permutation orders, that is \(H_{d,\tau }^{\mathrm{p}}(\{ x_t\} ) = - \mathop {\sum}\nolimits_\pi p_\pi \,{\mathrm{log}}\,p_\pi\), where p_π is the probability of encountering the pattern associated with permutation π (see Supplementary Figure 1).

As described above, calculating the permutation entropy of a time series requires selecting values for the embedding dimension d, the time delay τ, and the window length N over which permutation entropy is calculated. In this study, our goal was to find conservative values of H^p by searching over a wide range of possible (d,τ) pairs and setting \(H^{\mathrm{p}}(\{ x_t\} ) = {\mathrm{min}}_{d,\tau }H_{d,\tau }^{\mathrm{p}}(\{ x_t\} )\). However, the value of H^p should always decline as the embedding dimension d grows, i.e. no minimum value of H^p will exist for finite windows sizes N. To address this issue, we follow Brandmaier³⁸ and exclude all unobserved symbols when calculating H^p, which acts as a penalty against higher dimensions and results in a minimum value of H^p for finite length time series. To control for differences in dimension and for the effect of time-series length on the entropy estimation, we normalize the entropy by log(d!), ensure that each window is greater in length than d!, and confirm that the estimate of H^p has stabilized (specifically that the marginal change in H^p as data are added is <1%). To facilitate interpretation, we present results from continuous intervals by fixing τ = 1. However, our results generalize to the case where we fix both d and τ across all diseases and where we minimize over a range of (d,τ) pairs (see Supplementary Figure 4).

Permutation entropy does not require the a priori specification of a mechanistic nor generating model, which allows us to study the predictability of—potentially very different—systems within a unified framework. What is not explicit in the above formulation is that the permutation entropy can be accurately measured with far shorter time series than Lyapunov exponents and that it is robust to both stochasticity and monotonous transformations of the data, i.e. it is equivalent for time series with different magnitudes^31,39. Consider—for example—two opposite cases with respect to their known predictability, pure white noise, and a perfectly periodic signal. We expect the former, being essentially random, to display a very high entropy as compared with the latter, which instead we expect to show a rather low entropy in consideration of its simple periodic structure.

In Fig. 1, we demonstrate that this is indeed the case, even when we allow the periodic signal to be corrupted by a small amount of noise. We track the short-scale predictability of the time series by calculating the permutation entropy in moving windows (with width = 1 year, although the results are robust to variation in window size). For comparison, we calculate the same moving-window estimate of the permutation entropy for the time series of measles cases in Texas prior to the introduction of the first vaccine. The critical observation is that the moving-window entropy for the measles time series fluctuates between values comparable with that of pure random noise and, at times, values closer to the more predictable periodic signal, which suggests alternating intervals with different dynamical regimes and, thus, predictability. The magnitude of the entropy fluctuations for measles in Texas is statistically significant by permutation test, p < 0.001, as compared with simulated fluctuations obtained by building an estimated multinomial distribution over the symbols and repeatedly calculating the expected Jensen–Shannon (JS) divergence from simulations.

Pathogen-dependent entropy horizons

We now turn our attention to a broader set of diseases and ask how the predictability, defined as χ = 1 − H^p (where H^p is the permutation entropy), scales with the amount of available data (i.e. the time-series length). Specifically, we compute the permutation entropy across more than 25 years of weekly data at the US state-level for chlamydia, dengue, gonorrhea, hepatitis A, influenza, measles, mumps, polio, and whooping cough and plot the predictability (χ = 1 − H^p) as a function of the length of each time series. Focusing first on the predictability over short timescales (Fig. 2), for each time series we average H^p over temporal windows of width up to 100 weeks by selecting 1000 random starting points from each state-level time series for disease and calculating H^p for windows of length 10, 12, ..., 100.

We find that all diseases show a clear decrease in predictability with increasing time series length , which implies that accumulating longer stretches of time-series data for a given disease does not translate into improved predictability. However, we also find strong evidence that the majority of single outbreaks—i.e. temporal horizons characteristic for each disease—are predictable. The confidence intervals in Fig. 2 show that there can be large variation in predictability across outbreaks of the same disease, providing a first indication of the presence of a changing underlying model structures and or dynamics on the scale of months. We obtained similar results, e.g., decreasing predictability with time-series length, clustering of diseases, and the emergence of barriers to forecasting, using a weighted version of the permutation entropy, which reduces the dependence of the standard unweighted permutation entropy on rare, large fluctuations and by considering estimates of the permutation entropy where the time delay, τ, is allowed to vary^32,40 (see Supplementary Figure 4). By comparison, across all models with fixed structures studied to date, e.g., white noise, sine waves, and even chaotic systems, the predictability is constant in time or is expected to improve with increasing amounts of time-series data⁴¹.

Zooming out, what is also conspicuous about the relationship between time-series length and predictability is that diseases cluster together and show disease-specific slopes, i.e. predictability vs. time-series length, which suggests that permutation entropy is indeed detecting temporal features specific to each disease (Fig. 3a). After re-normalizing time for each disease by its corresponding R₀ (the average number of secondary infections a pathogen will generate during an outbreak epidemic when the entire population is susceptible, very large, and is seeded with a single infectious individual)—we used the mean of all reported values found in a literature review (see Supplementary Table 1)—we find that the best-fit mixed-effect slope on a log scale is 1 and that the residual effect is well predicted by the times series’ embedding dimension d (see Supplementary Figures 2 and 3). Moreover, because the embedding dimension d of a time series is the length of the basic blocks used in the calculation of the permutation entropy, it encodes the fundamental temporal unit of predictability in the form of an entropy production rate, thus implying that predictability decreases with time-series data at a disease-specific rate determined to first order by R₀, which is further modulated by d. The result that predictability depends on temporal scale also suggests that the permutation entropy could be an approach for justifying the utility of different data sets, i.e. one could determine the optimal granularity of data by selecting the dimension that maximized predictability.

Drivers of disease time-series predictability

One might assume that this phenomenon, i.e. decreasing predictability with increasing time-series length, could be driven purely by random walks on the symbolic alphabet used in the permutation entropy estimation. However, n-dimensional Markov chain models built from the time-series embeddings (n = d the time-series embedding dimension) consistently produced stable and smaller predictability values in comparison with those obtained from data, corroborating that the predictability behavior we observe does not stem from random fluctuations but is an actual fundamental feature of spreading processes (see Supplementary Figure 6 and Methods for details on the Markov chain simulations). This observation that Markov chain models of the same embedding order do not reproduce the observed predictability indicates that either the model structure is changing in time and/or the system has a very long memory, which is consistent with our current understanding of the entanglement between mobility and disease^3,42. That the best-fit n-dimensional Markov chain models over-predict the amount of entropy in real systems, also supports our earlier results that predictable structure does exist across most long outbreak time series.

To gain insight into what mechanisms might be driving changes in the predictability, we take advantage of the repeated, natural experiment of vaccine introduction. For diseases, such as measles, where we have data from both the pre- and post-vaccine era, we ask whether the permutation entropy changes after the start of widespread vaccination. We consistently observe that predictability decreases after vaccination, again with significance determined by permutation test (Fig. 4a). We also find that the symbol frequency distribution changes significantly after vaccination, as measured by the Jensen–Shannon divergence, across all states in the United States (Fig. 4b). Critically, because—as stated earlier—permutation entropy is not affected by changes in magnitude, the difference in entropy cannot simply be accounted for by a reduction in cases. Instead, it means that the temporal pattern of cases changes after vaccination. This leads us to the hypothesis that the distribution of secondary infections, its first moment or R₀ and its higher moments, drives predictable changes in the permutation entropy, a phenomenon originally discovered in synthetic directed networks by Meyers et al.⁴³.

To further evaluate the hypothesis that heterogeneity in the number of secondary infections produces predictable changes in permutation entropy, we simulate an SIR model with probabilistic restart at end of each outbreak (details in the Supplement) on two classes of temporal networks constructed from the Simplicial Activity Driven (SAD) model⁴³, a modified activity driven (AD) model in which an activated node contacts s other nodes and induces new links between the contacted nodes (see Methods). In this model we can control the epidemic threshold and the number of secondary contacts by changing the activity and the number of contacted nodes per activation. We simulated two scenarios, one in which the number of contacted nodes per activation is fixed (regular SAD) and one in which we allow fluctuations in contact number (irregular SAD), which generates fluctuations in the number of secondary infections. For both models, we investigated the predictability from below to above the epidemic or critical transmissibility threshold (set to 1 here). From the resulting epidemic curves, we calculated the permutation entropy. Figure 5a shows that we find the same pattern of decreasing predictability observed in real data with longer time series. Figure 5b shows the predictability obtained for the two scenarios below and above the transition: we see that that the strongest difference is present below the transition, where the lack of peculiar structure (the regular contact pattern) induces lower predictabilities than for heterogeneous contact distributions. Above the transitions, we find a reduced effect of the difference in contact structure.

Discussion

From these results, we can draw three conclusions. First, differences in the average reproductive number, coupled with heterogeneity in the number of secondary infections, can drive differences in predictability across diseases and outbreaks, which is related to results on predicting disease arrival time on networks⁴⁴ and to recurrent epidemics in hierarchical meta populations⁴⁵. Second, the permutation entropy could provide a model-free approach for detecting epidemics, which is similar to a recent model-based approach based on bifurcation delays^46,47,48. Finally, as outbreaks grow and transition to large-scale epidemics, they should become more predictable, which—as seen in Figs. 1 and 3—appears to be true for real-world diseases as well and agrees with earlier results on how permutation entropy relates to the predictability of nonlinear systems³².

Our finding that horizons exist for infectious disease forecast accuracy and that aggregating over multiple outbreaks can actually decrease predictability is supported by five additional lines of evidence. First, Hufnagel et al., using data on the 2004 SARS outbreak and airline travel networks, demonstrated that heterogeneity in connectivity can improve predictability²⁷. Second, de Cellés et al. noted a sharp horizon in forecast accuracy for whooping cough outbreaks in Massachusetts, USA⁴⁹. Third, Coletti et al. demonstrated that seasonal outbreaks of influenza in France often have unique spatiotemporal patterns, some of which cannot be explained by viral strain, climate, nor commuting patterns⁵⁰. Fourth, Artois et al. found that while it was possible to predict the presence human A(H7N9) cases in China, they were unable to derive accurate forecasts for the temporal dynamics of human case counts⁵¹. Finally, using state-level data from Mexico on measles, mumps, rubella, varicella, scarlet fever, and pertussis, Mahmud et al. showed evidence that while short-term forecasts were often highly accurate, long-term forecast quality quickly degraded⁵².

Research in dynamical systems over the the past 30 years has demonstrated that prediction error increases with increasing forecast length⁴¹. However, across that same body work, researchers typically find that predictions improve when they are trained on longer time series, even for chaotic systems⁴¹. Indeed, even for permutation entropy, an active area of research is how spurious aspects of time series can lead to spuriously increasing predictability with increasing time-series lengths³⁷. Our data-driven results suggest that for infectious diseases the opposite is true, more time-series data might often lead to lower predictability. Then, by integrating our biological understanding of each pathogen and simulated outbreaks, we found that changing dynamics, e.g., empirical changes in vaccination coverage and simulated shifts in the number of secondary infections as a disease moves through a heterogeneous social network, can cause the prediction error to increase with increasing data, which is related to earlier findings on the role of airline travel networks and disease forecasting³⁶. What this implies is that different models generate data at different time points and suggests that the optimal coarse-graining of complex systems might change with scale and or time⁵³. The potential for scale-dependent models of infectious disease transmission is supported by a recent analysis of US city-level data on influenza outbreaks that found consistent, mechanistic differences in outbreak dynamics based on city size⁵⁴.

The global community of scientists, public health officials, and medical professionals studying infectious diseases has placed a high value on predicting when and where outbreaks will occur, along with how severe they will be^3,27,55,56. Our results demonstrate that outbreaks should be predictable. However, as outbreaks spread—and spatiotemporally separated waves become entangled with the substrate, human mobility, behavioral changes, pathogen evolution, etc.—the system is driven through a space of diverse model structures, driving down predictability despite increasing time-series lengths. Taken together, our results agree with observations that accurate long-range forecasts for complex adaptive systems, e.g., contagions beyond a single outbreak, may be impossible to achieve due to the emergence of entropy barriers. However, they also support the utility and accuracy of dynamical modeling approaches for infectious disease forecasting, especially those that leverage myriad data streams and are iteratively calibrated as outbreaks evolves. Lastly, our results also suggest that cross-validation over long infectious disease time series cannot guarantee that the correct model for any individual window of time will be favored, which would imply a no free lunch theorem⁵⁷ for infectious disease model selection, and perhaps for sociobiological systems more generally.

Methods

Permutation entropy

Here we make use of permutation entropy as a model-independent measure of the growth in complexity and unpredictability of infectious disease time series. Given a time series {x_t}_t=1,…,N indexed by positive integers, an embedding dimension d and a temporal delay τ, one can consider the set of all sequences of values s of the type s = {x_t,x_t+τ,…,x_t+(d−1)τ}. Note that successive values x_t+iτ,x_t+(i+1)τ for generic i can be in an arbitrary relative order. To each s, one can associate the permutation π of order d that makes s totally ordered, that is \(\tilde d = \pi (d) = \{ x_{t_i}, \ldots ,x_{t_N}\}\) such that \(x_{t_i} < x_{t_j}{\kern 1pt} \forall t_i{\kern 1pt} < t_j\). In this way, via π we associate a rank-order quantity that is independent of the actual values the time series takes and we can associate a probability p_π to each permutation by simply counting how many times it appears in the data as compared to the total number of sequences appearing. The permutation entropy of time series {x_t} is then given by the Shannon entropy on the permutation orders, that is \(H_{d,\tau }^{\mathrm{p}}(\{ x_t\} ) = - \mathop {\sum}\nolimits_\pi p_\pi {\mathrm{log}}p_\pi\). We find that diseases cluster based on the best-fit dimension, d (see Supplementary Figure 2), and that the disease-specific slopes for a random effects model of (log)entropy and (log)time series can be predicted based on the embedding dimension (Supplementary Figure 3).

In the manuscript, we show results obtained by fixing τ = 1 to aid the intuition of the reader and select the most conservative (smallest) value of \(H^{\mathrm{p}}(\{ x_t\} ) = \mathop {{{\mathrm{min}}}}\nolimits_d H_{d,\tau = 1}^{\mathrm{p}}(\{ x_t\} )\) by sweeping over a wide range of possible d values. However, the qualitative results do not change even when we allow for a full sweep across (d,τ) pairs and setting \(H^{\mathrm{p}}(\{ x_t\} ) = \mathop {{{\mathrm{min}}}}\nolimits_{d,\tau } H_{d,\tau }^{\mathrm{p}}(\{ x_t\} )\) (see Supplementary Figure 4). In addition, we also confirmed that similar results were obtained by using the weighted permutation entropy, as presented in refs. ^32,40 and implemented in the R package statcomp v. 0.0.1.1000⁵⁸, see Supplementary Figure 5. Although, it is worth pointing out that weighted permutation entropy attempts to normalize away exactly the kind of structure infectious disease modelers aim to predict.

Markov chain simulations

In order to assess the amount of non-random structure in the real outbreak time series, we build synthetic symbolic time series by simulating Markov chains over the symbol distributions obtained from the empirical time series. For each real time series {x_t}_i, we extract the set of permutation symbols {π} as in the standard calculation for permutation entropy. We utilize τ = 1 and the embedding dimension d_i previously selected during the permutation entropy computation as described by Brandmeier³⁸. For a time series with embedding dimension d, there is a maximum number of d! states, corresponding to the possible permutations of length d. Using the permutations as states, we then count the number of transitions n_ij in the real time series between each pair of symbols (i,j) and use it to build a Markov chain with transition probabilities between states given by \(p_{ij} = \frac{{n_{ij}}}{{\mathop {\sum}\nolimits_j n_{ij}}}\). In order to obtain a synthetic symbolic series, we repeatedly start from a randomly selected node and use the Markov Chain described above to produce symbolic series with the same number of symbols as the corresponding real time series. For each iteration, we calculate the associated symbolic entropy. In Supplementary Figure 6, we compare the synthetic entropies versus the permutation entropy of the original time series and show that the former are systematically higher than the real ones, implying that there is additional structure in the outbreak time series that is not captured simply by the probabilistic transition structure.

Epidemic simulations

We simulated a standard SIR model with restart on a class of temporal networks in which it is possible to control the expected number of secondary neighbors of nodes. The temporal networks were constructed using the SAD model, a modified version of the well-known activity driven (AD) model⁵⁹, in which activations of nodes can involve two (like in the standard AD model) or more nodes establishing reciprocal links. We simulated two types of networks: in the first the number of nodes contacted in every activation was kept constant (regular SAD with s = 4); in the second we allowed the number of contacted nodes to fluctuate between interactions (irregular SAD, we sampled s from a normal distribution with mean 〈s〉 = 4 and coefficient of variation = 0.4). All networks had N = 1000 nodes. Node activities were sampled from a power-law distribution ~a^−α with α = 2.2 and rescaled in order to have an average activity ~10⁻², such that nodes activated on average every 100 time steps.

Crucially, for this class of networks it is possible to calculate explicitly the (SIS) critical threshold λ_c = β₀/γ₀, where β₀ and γ₀ are respectively the matched infection and recovery probabilities at the transition. In order to investigate the behavior of the predictability across the epidemic transition, we fixed γ₀ = 0.1 and let β vary from 0.5β₀ (below the transition) to 4β₀ (far above the transition), where β₀ = λ_cγ₀ is the threshold infectivity matching γ₀. The values of γ₀ was chosen in order to match the average outbreak peak length to those observed in the data (roughly around 4 weeks). We then simulated the SIR model on the networks described above for T = 5000 steps: each outbreak was seeded with five randomly infected nodes and let run its course; at the end of the outbreak, we repeated the seeding until we reached the prescribed time-series length. We calculated the permutation entropy of the synthetic time series in the same way we processed the empirical ones.

Significance tests on moving-window permutation entropy

We use a permutation test to determine whether different time-series windows have distinct symbol distributions. Specifically, we fit a multinomial distribution to the normalized symbol frequency distributions and repeatedly simulate data from the estimated multinomials. Then, we calculate the Jensen–Shannon divergence between each pair of simulated distributions. With these simulated distributions, we can ask how often we see fluctuations in our estimate of the permutation entropy just due to sampling. More formally, we use these simulated distributions as a null distribution for calculating a frequentist p-value based on the observed Jensen–Shannon divergence between the symbolic frequencies in time series windows.

Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

Code availability

All code associated with this study can be found here: https://github.com/Emergent-Epidemics/infectious_disease_predictability.

Data availability

Empirical data for all diseases—aside from dengue—were obtained from the U.S.A. National Notifiable Diseases Surveillance System as digitized by Project Tycho⁶⁰. Dengue data were obtained from the Pandemic Prediction and Forecasting Science and Technology Interagency Working Group under the National Science and Technology Council⁶¹. All data associated with this study can be found here: https://github.com/Emergent-Epidemics/infectious_disease_predictability.

References

Shaw, J. The SARS scare. Harv. Mag. 109, 48 (2007).
Google Scholar
Dye, C. & Gay, N. Modeling the SARS epidemic. Science 300, 1884–1885 (2003).
Article CAS Google Scholar
Colizza, V., Barrat, A., Barthélemy, M. & Vespignani, A. Predictability and epidemic pathways in global outbreaks of infectious diseases: the SARS case study. BMC Med. 5, 1 (2007).
Article Google Scholar
Chretien, J.-P. et al. Advancing epidemic prediction and forecasting: a new US government initiative. Online J. Public Health Inform. 7, e13 (2015).
Article Google Scholar
Meyers, L. A., Pourbohloul, B., Newman, M. E., Skowronski, D. M. & Brunham, R. C. Network theory and SARS: predicting outbreak diversity. J. Theor. Biol. 232, 71–81 (2005).
Article MathSciNet Google Scholar
Perra, N., & Gonçalves, B. Modeling and predicting human infectious diseases. In Social phenomena: From data analysis to models (pp. 59-83). Gonçalves, B., & Perra, N. (Eds.). (Springer Cham Heidelberg New York Dordrecht London, 2015).
Gandon, S., Day, T., Metcalf, C. J. E. & Grenfell, B. T. Forecasting epidemiological and evolutionary dynamics of infectious diseases. Trends Ecol. Evol. 31, 776–788 (2016).
Article Google Scholar
Reich, N. G. et al. Challenges in real-time prediction of infectious disease: a case study of dengue in thailand. PLoS Negl. Trop. Dis. 10, e0004761 (2016).
Article Google Scholar
Viboud, C. et al. The rapidd ebola forecasting challenge: synthesis and lessons learnt. Epidemics 22, 13–21 (2018).
Article Google Scholar
Peak, C. M. et al. Population mobility reductions associated with travel restrictions during the ebola epidemic in sierra leone: use of mobile phone data. Int. J. Epidemiol. 1, 9 (2018).
Google Scholar
Wesolowski, A. et al. Impact of human mobility on the emergence of dengue epidemics in pakistan. Proc. Natl Acad. Sci. USA 112, 11887–11892 (2015).
Article ADS CAS Google Scholar
Bansal, S., Chowell, G., Simonsen, L., Vespignani, A. & Viboud, C. Big data for infectious disease surveillance and modeling. J. Infect. Dis. 214, S375–S379 (2016).
Article Google Scholar
Funk, S., Camacho, A., Kucharski, A. J., Eggo, R. M. & Edmunds, W. J. Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model. Epidemics 22, 56–61 (2018).
Pastore-Piontti, A. et al. in Mathematical and Statistical Modeling for Emerging and Re-emerging Infectious Diseases (eds Chowell, G. & Hyman, J. M.) 39–56 (Springer, Berlin, 2016).
Lofgren, E. T. et al. Opinion: mathematical models: a key tool for outbreak response. Proc. Natl Acad. Sci. USA 111, 18095–18096 (2014).
Article ADS CAS Google Scholar
Chowell, G., Viboud, C., Simonsen, L., Merler, S. & Vespignani, A. Perspectives on model forecasts of the 2014–2015 ebola epidemic in West Africa: lessons and the way forward. BMC Med. 15, 42 (2017).
Article Google Scholar
Ray, E. L. & Reich, N. G. Prediction of infectious disease epidemics via weighted density ensembles. PLoS Comput. Biol. 14, e1005910 (2018).
Article ADS Google Scholar
Shaman, J., Karspeck, A., Yang, W., Tamerius, J. & Lipsitch, M. Real-time influenza forecasts during the 2012–2013 season. Nat. Commun. 4, 1–26 (2013).
Article Google Scholar
Venkatramanan, S. et al. Using data-driven agent-based models for forecasting emerging infectious diseases. Epidemics 22, 43–49 (2018).
Johansson, M. A., Reich, N. G., Hota, A., Brownstein, J. S. & Santillana, M. Evaluating the performance of infectious disease forecasts: a comparison of climate-driven and seasonal dengue forecasts for Mexico. Sci. Rep. 6, 33707 (2016).
Brooks, L. C., Farrow, D. C., Hyun, S., Tibshirani, R. J. & Rosenfeld, R. Flexible modeling of epidemics with an empirical bayes framework. PLoS Comput. Biol. 11, e1004382 (2015).
Article ADS Google Scholar
Chowell, G. et al. Using phenomenological models to characterize transmissibility and forecast patterns and final burden of zika epidemics. PLoS Curr. 8, ecurrents.outbreaks.f14b2217c902f453d9320 (2016).
Zhang, Q. et al. Social data mining and seasonal influenza forecasts: the fluoutlook platform. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (eds. Bifet, A. et al.) 237–240 (Springer, Cham, 2015).
Nsoesie, E. O., Beckman, R. J., Shashaani, S., Nagaraj, K. S. & Marathe, M. V. A simulation optimization approach to epidemic forecasting. PLoS ONE 8, e67164 (2013).
Article ADS CAS Google Scholar
Holmes, E. C., Rambaut, A. & Andersen, K. G. Pandemics: spend on surveillance, not prediction. Nature. https://www.nature.com/articles/d41586-018-05373-w (2018).
Rivers, C. M. & Scarpino, S. V. Modelling the trajectory of disease outbreaks works. Nature 559, 477 (2018).
Article CAS Google Scholar
Hufnagel, L., Brockmann, D. & Geisel, T. Forecast and control of epidemics in a globalized world. Proc. Natl Acad. Sci. USA 101, 15124–15129 (2004).
Article ADS CAS Google Scholar
Moran, K. R. et al. Epidemic forecasting is messier than weather forecasting: the role of human behavior and internet data streams in epidemic forecast. J. Infect. Dis. 214, S404–S408 (2016).
Article Google Scholar
Biggerstaff, M. et al. Results from the centers for disease control and preventions predict the 2013–2014 influenza season challenge. BMC Infect. Dis. 16, 357 (2016).
Article Google Scholar
OSTP. Pandemic Prediction and Forecasting Science and Technology Working Group: Towards Epidemic Prediction: Federal Efforts and Opportunities in Outbreak Modeling (National Science and Technology Council, USA, 2016).
Bandt, C. & Pompe, B. Permutation entropy: a natural complexity measure for time series. Phys. Rev. Lett. 88, 174102 (2002).
Article ADS Google Scholar
Garland, J., James, R. & Bradley, E. Model-free quantification of time-series predictability. Phys. Rev. E 90, 052910 (2014).
Article ADS Google Scholar
Pennekamp, F. et al. The intrinsic predictability of ecological time series and its potential to guide forecasting. bioRxiv 350017 https://www.biorxiv.org/content/10.1101/350017v1 (2018).
Politi, A. Quantifying the dynamical complexity of chaotic time series. Phys. Rev. Lett. 118, 144101 (2017).
Article ADS Google Scholar
Garland, J. et al. Anomaly detection in paleoclimate records using permutation entropy. Entropy 20, 931 (2018).
Article ADS Google Scholar
Colizza, V., Barrat, A., Barthélemy, M. & Vespignani, A. The role of the airline transportation network in the prediction and predictability of global epidemics. Proc. Natl Acad. Sci. USA 103, 2015–2020 (2006).
Article ADS CAS Google Scholar
Zunino, L., Olivares, F., Scholkmann, F. & Rosso, O. A. Permutation entropy based time series analysis: equalities in the input signal can lead to false conclusions. Phys. Lett. A 381, 1883–1892 (2017).
Article ADS CAS Google Scholar
Brandmaier, A. M. pdc: an r package for complexity-based clustering of time series. J. Stat. Softw. 67, 1–23 (2015).
Article Google Scholar
Zunino, L., Soriano, M. C. & Rosso, O. A. Distinguishing chaotic and stochastic dynamics from time series by using a multiscale symbolic approach. Phys. Rev. E 86, 046210 (2012).
Article ADS CAS Google Scholar
Fadlallah, B., Chen, B., Keil, A. & Príncipe, J. Weighted-permutation entropy: a complexity measure for time series incorporating amplitude information. Phys. Rev. E 87, 022911 (2013).
Article ADS Google Scholar
Farmer, J. D. & Sidorowich, J. J. Predicting chaotic time series. Phys. Rev. Lett. 59, 845 (1987).
Article ADS MathSciNet CAS Google Scholar
Szell, M., Sinatra, R., Petri, G., Thurner, S. & Latora, V. Understanding mobility in a social petri dish. Sci. Rep. 2, 457 (2012).
Article ADS Google Scholar
Petri, G. & Barrat, A. Simplicial activity driven model. Phys. Rev. Lett. 121, 228301 (2018).
Article ADS Google Scholar
Shu, P., Tang, M., Gong, K. & Liu, Y. Effects of weak ties on epidemic predictability on community networks. Chaos 22, 043124 (2012).
Article ADS MathSciNet Google Scholar
Watts, D. J., Muhamad, R., Medina, D. C. & Dodds, P. S. Multiscale, resurgent epidemics in a hierarchical metapopulation model. Proc. Natl Acad. Sci. USA 102, 11157–11162 (2005).
Article ADS CAS Google Scholar
Dibble, C. J., O’Dea, E. B., Park, A. W. & Drake, J. M. Waiting time to infectious disease emergence. J. R. Soc. Interface 13, 20160540 (2016).
Article Google Scholar
Brett, T. S. et al. Anticipating epidemic transitions with imperfect data. PLoS Comput. Biol. 14, e1006204 (2018).
Article Google Scholar
Miller, P. B., O’Dea, E. B., Rohani, P. & Drake, J. M. Forecasting infectious disease emergence subject to seasonal forcing. Theor. Biol. Med. Model. 14, 17 (2017).
Article Google Scholar
de Cellès, M. D., Magpantay, F. M., King, A. A. & Rohani, P. The impact of past vaccination coverage and immunity on pertussis resurgence. Sci. Transl. Med. 10, eaaj1748 (2018).
Article Google Scholar
Coletti, P., Poletto, C., Turbelin, C., Blanchon, T. & Colizza, V. Shifting patterns of seasonal influenza epidemics. Sci. Rep. 8, 1–12 (2018).
Article CAS Google Scholar
Artois, J. et al. Changing geographic patterns and risk factors for avian influenza a (h7n9) infections in humans, China. Emerg. Infect. Dis. 24, 87 (2018).
Article Google Scholar
Mahmud, A., Metcalf, C. & Grenfell, B. Comparative dynamics, seasonality in transmission, and predictability of childhood infections in mexico. Epidemiol. Infect. 145, 607–625 (2017).
Article CAS Google Scholar
Wolpert, D. H., Grochow, J. A., Libby, E. & DeDeo, S. Optimal high-level descriptions of dynamical systems. Preprint at https://arxiv.org/abs/1409.7403 (2014).
Dalziel, B. D. et al. Urbanization and humidity shape the intensity of influenza epidemics in u.s. cities. Science 362, 75–79 (2018).
Article CAS Google Scholar
Altizer, S., Ostfeld, R. S., Johnson, P. T., Kutz, S. & Harvell, C. D. Climate change and infectious diseases: from evidence to a predictive framework. Science 341, 514–519 (2013).
Article ADS CAS Google Scholar
Myers, M. F., Rogers, D., Cox, J., Flahault, A. & Hay, S. Forecasting disease risk for increased epidemic preparedness in public health. Adv. Parasitol. 47, 309–330 (2000).
Article CAS Google Scholar
Wolpert, D. H. & Macready, W. G. No free lunch theorems for optimization. IEEE Trans. Evolut. Comput. 1, 67–82 (1997).
Article Google Scholar
Sippel, S., Lange, H. & Gans, F. statcomp: Statistical Complexity and Information Measures for Time Series Analysis. https://github.com/cran/statcomp/ (2016). R package version 0.0.1.1000.
Perra, N., Gonçalves, B., Pastor-Satorras, R. & Vespignani, A. Activity driven modeling of time varying networks. Sci. Rep. 2, 469 (2012).
Article ADS CAS Google Scholar
van Panhuis, W. G. et al. Contagious diseases in the united states from 1888 to the present. N. Engl. J. Med. 369, 2152–2152 (2013).
Article Google Scholar
Johansson, M. Dengue forecasting project. http://dengueforecasting.noaa.gov/ (2015).

Download references

Acknowledgements

We thank Joshua Garland, Pejman Rohani, and Alessandro Vespignani for productive conversations on permutation entropy and helpful comments on an earlier version of the manuscript. S.V.S. received funding support from the University of Vermont and Northeastern University. G.P. received funding support from Fondazione Compagnia San Paolo. S.V.S. and G.P. conducted the study as fellows at IMeRA and drafted the manuscript at Four Corners of the Earth in Burlington Vermont.

Author information

These authors contributed equally: Samuel V. Scarpino, Giovanni Petri.

Authors and Affiliations

Network Science Institute, Northeastern University, Boston, MA, 02115, USA
Samuel V. Scarpino
Marine & Environmental Sciences, Northeastern University, Boston, MA, 02115, USA
Samuel V. Scarpino
Physics, Northeastern University, Boston, MA, 02115, USA
Samuel V. Scarpino
Health Sciences, Northeastern University, Boston, MA, 02115, USA
Samuel V. Scarpino
Dharma Platform, Washington, DC, 20005, USA
Samuel V. Scarpino
ISI Foundation, 10126, Turin, Italy
Samuel V. Scarpino & Giovanni Petri
ISI Global Science Foundation, New York, NY, 10018, USA
Giovanni Petri

Authors

Samuel V. Scarpino
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Petri
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both authors conceived the project, performed the simulations and calculations, analyzed the empirical data, interpreted the results, and produced the final manuscript.

Corresponding authors

Correspondence to Samuel V. Scarpino or Giovanni Petri.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Journal peer review information: Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Scarpino, S.V., Petri, G. On the predictability of infectious disease outbreaks. Nat Commun 10, 898 (2019). https://doi.org/10.1038/s41467-019-08616-0

Download citation

Received: 14 June 2017
Accepted: 14 January 2019
Published: 22 February 2019
DOI: https://doi.org/10.1038/s41467-019-08616-0

This article is cited by

Increase in MJO predictability under global warming
- Danni Du
- Aneesh C. Subramanian
- Elizabeth Bradley
Nature Climate Change (2024)
Evaluation of Bayesian spatiotemporal infectious disease models for prospective surveillance analysis
- Joanne Kim
- Andrew B. Lawson
- Gerardo Chowell
BMC Medical Research Methodology (2023)
Differences in COVID-19 cyclicity and predictability among U.S. counties and states reflect the effectiveness of protective measures
- Claudio Bozzuto
- Anthony R. Ives
Scientific Reports (2023)
Intrinsic randomness in epidemic modelling beyond statistical uncertainty
- Matthew J. Penn
- Daniel J. Laydon
- Samir Bhatt
Communications Physics (2023)
On the forecastability of food insecurity
- Pietro Foini
- Michele Tizzoni
- Elisa Omodei
Scientific Reports (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.