Article Text

Country performance against COVID-19: rankings for 35 countries
  1. Dean T Jamison1,
  2. Lawrence J Lau2,
  3. Kin Bing Wu3,
  4. Yanyan Xiong4
  1. 1Institute for Global Health Sciences, University of California, San Francisco, San Francisco, California, USA
  2. 2Lau Chor Tak Institute of Global Economics and Finance, The Chinese University of Hong Kong, Hong Kong, China
  3. 3Lead Education Specialist, World Bank, Washington, District of Columbia, USA (retired)
  4. 4School of Economics, Zhejiang University, Hangzhou, Zhejiang, China
  1. Correspondence to Prof Dean T Jamison; meteor17{at}mac.com

Abstract

Objective To generate rankings of 35 countries from all continents (except Africa) on performance against COVID-19.

Design International time series, cross-sectional analysis.

Selected countries Countries having 5500 or more cases (collectively including 85% of the world’s cases) as of 16 April 2020 and that had reached 135 days into their pandemic by 30 July.

Main outcome measures The initial severity and late-pandemic performance of countries can reasonably be ranked by COVID-19 cases or deaths per million population. For guiding policy and informing public accountability during the pandemic, we propose mid-pandemic performance rankings based on doubling time in days of the total number of cases and deaths in a country. Rank orderings then follow.

Results At day 25 into a country’s pandemic, cross-country performance variation was modest: in most countries, cumulative deaths doubled in fewer than 5 days. By day 65, and even more so by day 135, great cross-country variation emerged. By day 135, 9 of the 10 top-performing countries on deaths were European, although they were initially hard hit by the pandemic. Thus, rankings change rapidly enough to point to the value of a dynamic indicator. Five countries—Brazil, Mexico, India, Indonesia and Israel—were among the seven poorest performers at day 135 on both cases and deaths. Doubling times for cases and for deaths are positively correlated, but differ sufficiently to point to the value of both indicators.

Conclusions Readily available data support transparently generated rankings of countries’ performance against COVID-19 based on doubling times of cases and deaths. It is premature to judge the value of these rankings in practice, but the potential and early experience suggest they might help facilitate identification of good policies and inform judgements on national leadership.

  • public health
  • epidemiology
  • medical demography
http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key questions

What is already known?

  • COVID-19 deaths and cases per million population are routinely tracked and used for cross-country comparisons, but mostly among countries of similar income levels.

  • Metrics for pandemic preparedness have been generated and countries have been ranked by those metrics.

  • The doubling time in days of cases and deaths has been used country-by-country as an indicator of how fast the pandemic is proceeding within a country.

What are the new findings?

  • Our analysis generates performance rankings for 35 countries based on doubling times of cases and deaths at days 25, 65 and 135 into their pandemics.

  • The initial severity of the pandemic is assessed in a cross-country comparable way in terms of cases per million population and shown to vary by factors of more than 100.

  • Unlike initial severity, day 25 doubling times show small variation across countries.

  • By days 65 and 135, performance shows large (and increasing) cross-country variation, and rankings of countries by performance become meaningful. Good performance is identified in countries with both high and low levels of income.

What do the new findings imply?

  • Objectively generated measures of country performance have the potential to hold political leaders accountable and to provide metrics by which to judge the impact of alternative policies.

Introduction

On 31 December 2019, Chinese authorities informed the WHO’s Regional Office in Beijing of cases of pneumonia of unknown aetiology appearing in Wuhan, capital of Hubei Province.1 COVID-19’s subsequent upward trajectory has dominated news and political attention for the past 12 months. During this time, countries differed widely in their responses to the pandemic. Some acted quickly, others more slowly. Some paid close attention to WHO guidance and emerging scientific findings, others less so. Some ramped up production and deployment of test kits and personal protection equipment, others assigned this lower priority. Do these choices matter for health and economic outcomes? Our purpose in this study was to provide performance metrics to underpin analysis of the extent to which policy choices matter for health outcomes among a range of potential outcome determinants.

A thoughtful recent assessment reviewed existing approaches to cross-national performance measures against COVID-19 and pointed to a range of indicators that should ideally be included.2 In addition to cases and deaths per million population, other indicators that were proposed included country response capacity and long-term sequelae. The current study uses only a parsimonious subset of the suggested indicators, but the recent review pointed to no previous quantitative rankings of country performance other than cases or deaths per million population. We will later point to reasons why these two specific outcome measures provide limited insight to mid-pandemic performance (although valuable for retroactive assessments). This paper thus provides, we believe, a first attempt to generate mid-pandemic rankings of countries by quantitative measures of outcome. Mid-pandemic measures are, of course, what are needed to guide the evolution of response policies.

Performance metrics potentially serve three distinct purposes. First, by providing concrete indicators of outcomes that can be used across countries, metrics enable evidence-based assessment of both policies and the identification of good practices. Second, such measures facilitate understanding of the importance of contextual factors influencing outcomes such as age distribution of the population, population density, seasonality, local climate or (conceivably) viral genetics.3 (Statistical analysis is required to disentangle the effects of policy from contextual factors; these might include use of country fixed effects or hierarchical models.) Such understanding could—if timely—provide advance notification of the magnitude of the pandemic problem that may need to be addressed and help guide policy and planning in the right directions. A third purpose for performance rankings is to provide a basis for political accountability, similar to the use of measures of gross national income (GNI) per person and of employment levels in public discourse on economic performance.

Although the principal public health objective of pandemic control is to save lives, COVID-19 morbidity is increasingly understood to be significant, and we thus assess country performance on cases as well as on deaths.4 Ideally, it would be possible to track infections—either cumulative infections from a cross-sectional seroprevalence survey or incidence of infection from repeated seroprevalence surveys, such as recently reported for the Canton of Geneva in Switzerland.5 Realistically, however, comparable seroprevalence data for large numbers of countries are unlikely to become available.

While this paper discusses only performance on health outcomes, adverse economic and social outcomes are now severe. A key input into policy making is understanding the extent to which health and economic goals are mutually reinforcing and the extent to which they conflict. Good indicators of performance on both health and economic outcomes can help inform understanding of the policy trade-offs.

The salience of GNI growth rate (rather than its absolute size) suggests the communication value of indicators based on rates of change. At any given time, the growth rate in a country’s cumulative number of cases, for example, translates into a doubling time defined as how many days it would take the cumulative number of cases to double if that rate were to continue. The longer the doubling time, the better a country is doing. Our purpose in this paper was to provide a framework to analyse mid-pandemic performance based on doubling times for both cases and deaths, but we also discussed alternative metrics.

Methods

Our selection criteria for country inclusion were, first, countries accounting for a significant percentage of global cases (with 5500 cumulative cases or more at individual country level) as of 16 April when the study began and, second, country data being available long enough into the pandemic so that day 135 performance could be calculated. Our choice of day 135 but not further along was both to provide a baseline and to ensure that a sufficient number of countries from different continents could be selected for the study. This resulted in the selection of 35 countries from all continents except Africa. These countries accounted for 85% of the global cumulative cases and 84% of global cumulative deaths as of 16 April.6 The diversity of these 35 countries can shed light on whether their development stages and income levels have bearings on performance.

We chose our mid-pandemic performance metrics based on three criteria: they were available in current cross-country data series, could serve directly as indicators (after adjustment for population size if necessary) or as a derived indicator, and importantly, could adequately reflect performance during the pandemic, not only towards its end. This third criterion warranted further discussion. France and the USA, which had start dates close to each other, exemplified the inadequacy of using cumulative deaths per million population to measure mid-pandemic performance. By 29 April, France had cumulated about 360 deaths per million, about twice the number for the USA. Newspaper and other accounts used this ratio to suggest that the USA was performing better than France. However, 3 months later, on 29 July, the USA actually had more deaths per million than did France, and the USA to France ratio was steadily increasing. (As of 8 October, deaths per million population in France had reached 499, and in the USA, the number had become one-third higher.) Although deaths per million in late April appeared highly favourable to the USA, doubling times tell an entirely different story. In France in late April, the doubling time for deaths was about 42 days, whereas in the USA, deaths were doubling almost twice as frequently (about every 22 days). Doubling time thus foreshadowed the reversal in deaths per million that was to come 3 months later.

That deaths per million population performs poorly as a mid-pandemic performance measure implies little about its value late in a pandemic (when deaths per million is rising very slowly if at all). By early August, for example, deaths were rising very slowly in both Germany and France, implying that deaths per million were close to stabilising at a final value. For Germany, this near-final value was 110, whereas for France, the number exceeded that for Germany by a factor of over 4. It is thus reasonable to say that, overall, Germany performed far better than France (bearing in mind the possibility of a reversal if a second wave were to arrive).

We explored various data sources that provide comprehensive coverage of the world, including those from WHO’s daily ‘Situation Reports’ and Johns Hopkins University’s Coronavirus Resource Centre. We settled on Worldometer for pandemic data because it is in real time and, importantly, in the early days when information was not easy to come by, it provides links to its sources which are usually official websites and occasional research and news articles. From data underlying its graphs, we constructed time series on our included countries. We checked all the sources of Worldometer and used press reports, for example, from Turkey, and journal articles to provide context to the numbers.7 Data on China came from its National Health Commission.8 As China revised its cases and deaths statistics on Wuhan and Hubei, we incorporated the revisions and smoothed the data, keeping constant the new cumulative totals.,9 10 We updated the data on all other countries according to the available information in Worldometer on 15 November, 2020. Online supplemental files 1 and 2 show doubling times in days for cases and for deaths for each included country. All population data come from the World Bank.11

Supplemental material

Supplemental material

Each country’s start date for its pandemic was defined to be the first day for which the cumulative number of cases had reached 20 or more. The emergence of 20 cases in a country—along with the WHO’s 30 January 2020 declaration of COVID-19 as a public health emergency of international concern—would have provided clear indication to a country’s political leadership of the need for action. We measure the initial severity of the pandemic in a country by its day 25 number of cases per million population. The initial severity and the evolution of doubling time in mid-pandemic determine the late-pandemic performance.

Shortcomings in data available mid-pandemic receive increasing scrutiny in the press and within the academic community. COVID-19 deaths themselves appear in most countries to be biassed downward and hence to have needed upward adjustment as more data come in.12 13, In part, this results from an increasingly understood gap between observed increases in all-cause mortality and reported COVID-19 mortality.14 This can result both from the under-reporting of COVID-19 deaths and from non-coronavirus mortality rate changes that are caused by the pandemic or the response to it. Data on cases may not be strictly comparable across countries as the definition of cases varies across countries; for example, some countries include only cases confirmed by PCR tests, while others include presumptive cases when patients display clinical symptoms. The increasingly understood importance of COVID-19 morbidity led us to include cases, despite case data being inconsistently measured across countries. Although we recognise the potential challenge, we believe it is still worthwhile to work with available though imperfect data in order to come up with timely measures to inform policy.

Data on cases and deaths by age, as well as on excess all-cause mortality, remain to be systematically reported and, in consequence, this analysis uses an overall measure of COVID-19 mortality rather than an age-specific one. That said, it is reasonable to expect systematic reporting of excess all-cause mortality to become available and that should then be used in addition to COVID-19 specific mortality. Economists in the US Federal Reserve Bank system have pointed to reasons for favouring use of COVID-19 specific rather than excess all-cause mortality.15

We use the trajectories of the cumulative numbers of cases and deaths to assess country performance: how quickly, at any point in time, is a country flattening the rise in cumulative cases (or deaths) over time? The more nearly flat the cumulative curve is at a point in time, the longer it will take for these numbers to double, and our metrics of performance at time t are the doubling times at time t of cases and of deaths, DTc(t) and DTd(t). Time is measured in days from the start date. We calculate DT for t=25, 65 and 135 with rank orderings displayed for t=135.

Doubling time is calculated in two steps. Let C(t) and D(t) be the cumulative number of cases and of deaths at time t. Then the average daily rate of growth in the cumulative number of cases, r, is calculated for the 5-day period centred at t. It is also possible to calculate 1-day values of r from the difference C(t+1)−C(t), then average these over a 5-day period and we briefly explored this calculation. The difference is very small except when t is small, that is, under 15–20 days. The value for r is given by

r(t)=ln[C(t+2)/C(t−2)]/4.

From r, doubling time follows:

DTc(t)=ln2/r(t).

Doubling times for deaths were similarly calculated. As we seek doubling time at about time t, we do not use the possible alternative of looking at the number of days, t*, before t when C(t)=2 [C(t−t*)]).

We explored several alternatives to doubling time involving first and second derivatives of C(t). The amount of change or the first derivative of C(t), C’(t), is widely reported in the press as daily new cases (often per million population). For comparisons over time, and across countries, the actual amount of change appears less useful than the rate of change [C’(t)/C(t)]. In economics, popular and professional discussions of national income focus not on the amount of income change but rather on its rate of change. By analogy, we concluded that an indicator, like doubling time, based on rate of change in deaths or cases, would be more informative than the amount of change. The rise and fall of new cases per day [C’(t)] would potentially be an attractive metric. This constitutes the second derivative of t, C’’(t), and we extensively explored this possibility. The results were so highly volatile that we concluded that C’’(t) is not usable in practice. As our paper focuses on outcome performance, we do not use process indicators (such as intensive care units or tests per million population). Their value lies in explaining the variations in outcomes we report.

Patient and public involvement

This study does not involve patients and public consultation because the analysis was based on secondary data from publicly available sources.

Results

Table 1 provides the context of the pandemic. It presents information on a country’s population, start date, cumulative cases and deaths per million population on days 25, 65 and 135. The table shows that initial severity (on day 25), measured in cumulative cases per million population, varies enormously across countries. It is natural to expect initial severity in less populous countries to be higher than that in more populous ones simply because the initial penetration of the pandemic is likely to be quite local. This pattern is evident in the table. Initial severity tends to be exceptionally high in many European countries, particularly in Austria, Belgium, Ireland and Switzerland. Even good performance on doubling times later in the pandemic can only partially compensate in influencing mortality per million population later. Switzerland, for example, had the highest initial severity of any of the 35 countries. Its initial severity was substantially—but incompletely—compensated for by excellent performance by day 65. Germany had achieved relatively few deaths per million population at day 135 by combining (relatively) low initial severity with good performance on doubling times. In contrast, Mexico had low initial severity but sustained low doubling times that resulted by day 135 in high case and death numbers per million population. High initial severity is thus not inconsistent with high doubling times. Hence, they can be expected to independently affect late-pandemic performance.

Table 1

Evolution of the COVID-19 pandemic in 35 countries

Figure 1 displays the evolution of doubling time in the pandemic in USA from its beginning there in late February through late July. During the first month, both cases and deaths doubled very rapidly, in only a few days, and this was typical of most countries. Likewise typical was a sustained increase in doubling times for 6 weeks or more after the initial low levels. Figure 1 also shows the value of including cases as well as deaths in the analysis by showing a divergence in doubling times, with the doubling time for deaths beginning to increase much more rapidly than for cases. Again, this pattern is typical. Less typical was a sharp decline in performance in the USA, which figure 1 shows to begin for cases in mid-June and for deaths about 3 weeks later. Case doubling time serves as a leading indicator for deaths, and it is perhaps a hopeful sign that the doubling time for cases appears to have stopped declining by mid-July. That said, it remains important to bear in mind not only measurement error, particularly for cases, and that improved testing over time could lead to slower measured than actual improvement in performance on cases.

Figure 1

Doubling time for 10-day moving average cumulative cases and deaths in the USA.

Table 2 shows our main findings. The 35 countries are ordered from highest to lowest (best to worst) in doubling times for cases and deaths on day 135, with those in days 25 and 65 alongside for comparison. Figure 2A,B graphically illustrate the wide range in performance at days 135 and 65, but much less variation at day 65. In general, countries improved substantially between these times, although with marked cross-country variation. Country performance on deaths tracks performance on cases, but only imperfectly. In 5 of the 35 countries, the day 135 ranking on deaths differed by 10 or more from the ranking on cases. Norway and Germany, for example, each ranked 12 places higher on cases than on deaths, whereas the USA ranked 12 places higher on deaths than cases. Maintaining separate indicators is thus of value.

Table 2

Performance of 35 countries against COVID-19

Figure 2

Doubling time for cases (A) and deaths (B) between days 65 and 135 in 35 countries. Doubling time for cases and deaths are capped at 1000 days. See table 2 for details.

By day 135, eight countries—China, Ireland, Italy, Norway, Germany, Netherlands, the UK and Spain—had raised the doubling time in cases well above 300 days or almost 10 months, beyond which doubling time became a less useful indicator because of the small number of cases or deaths then being incurred. By the time of final revision of this paper, however, each of these eight countries, except China, had begun a second wave, after having reduced new cases to well less than 5% of the initial peaks. It may be that the best use of the doubling time indicator will be to restart the measure near the beginning of second waves.

Aside from Canada, Western Hemisphere countries perform entirely in the bottom half, with Brazil and Mexico doing particularly poorly. Among the four countries from the Middle East, Turkey performed reasonably well with ranking above the middle of the 35 countries in both cases and deaths by day 135. Israel, however, had experienced a very dramatic reversal, with doubling time of cases falling from a year on day 65 to under a month on day 135. In South Asia, India particularly faces rapidly surging doubling times for cases and deaths.

As the pandemic extended into day 135, a disturbing pattern of reversal occurred alongside impressive progress in many countries. Eight countries—Switzerland, Republic of Korea, France, Saudi Arabia, Czechia, Australia, Romania and Israel—experienced a resurgence of cases, resulting in lower doubling time between days 65 and 135, although the reversal was more severe in some of these countries than others. Three countries—Iran, Israel and Indonesia—experienced a reversal of doubling time in deaths between days 65 and 135, though the reversal was modest. These trends show just how challenging it is to control the pandemic, even though the multiple experiences of successful control show that suitable policy measures and behavioural responses can succeed.

Figure 3 displays the evolution of doubling time for six countries that span a range of performance levels. Figure 3A shows cases and figure 3B shows deaths between days 25 and 135. Brazil and India started low in performance and remain low. Italy recovered from its catastrophic early pandemic experience with high and rapidly rising doubling times. The UK shows sustained good performance on cases from about day 80 and on deaths from about day 100 (table 1 indicates the date by which each country reaches day 135). Figure 3 illustrates only 6 countries out of 35 but gives a broader sense of how doubling times illuminate performance in the course of the pandemic.

Figure 3

Doubling time in cases (A) and deaths (B) between days 25 and 135 in Brazil, India, Italy, Sweden, Turkey and the UK.

Discussion

There are a number of limitations to the analysis here. First, as previously discussed, data validity remains variable across countries and time. COVID-19 data quality is much discussed by others, and we simply note our study’s possible sensitivity to data quality and our expectation that data quality will improve. That said, we cannot at this point rule out the possibility of bias that might result from correlation between data quality and country performance. The possible direction of bias would be to understate the relative performance of countries with good data quality. Second, available resources lack consistent, cross-country data on mortality by age and ethnicity. Our study is thus limited to population-wide data on cases and deaths. Third, subnational data series, though not now widely available, arguably hold the keys to improved understanding of drivers of national level performance. An expanded version of this paper provides limited subnational assessment with analyses for Hubei (China), Lombardy (Italy) and New York (USA) between days 25 and 65.16 A fourth limitation is that countries that do exceptionally well in forestalling a serious epidemic may not appear among the countries we rank because of paucity of cases and deaths. Examples include Greece, New Zealand, Mongolia and Vietnam. In spite of their shared border with China, both Mongolia and Vietnam contained their cases to just a few hundreds and had no deaths up to day 135. Their experiences are worth learning from, but different methods are needed to identify them. Fifth, measures of R(t), the reproduction number for a pandemic at time t, indicate the extent to which transmission dynamics are unfavourable. Like doubling time, R(t) provides an attractive real-time indicator of progress against a pandemic. When cross-country time series for R become available, it will be important to compare and contrast with doubling time, but we are unable to do so now

This paper argues that doubling times provide good metrics for mid-pandemic performance, whereas the initial severity—which we define as severity at day 25—is well described by static measures of cumulative cases or deaths per million population. Likewise, we suggest that late-pandemic performance could be well described by the static measures. At the time we prepared a first draft of this paper, our expectation was that ‘late’ pandemic would describe the period after 135 days, but that turned out to be accurate for only perhaps half of our 35 countries. Figure 4 shows the 7day moving average of new cases per million population at day 135. For 10 countries, the number of new cases per day has fallen below five per million population, whereas five countries—Chile, Peru, USA, Brazil and Israel—are experiencing over 100 new cases per million per day on a 7-day moving average. For a majority of countries, a late-pandemic phase remains in the future. Doubling time indicators may thus remain useful in judging changes in performance and comparative performance for these countries.

Figure 4

Seven-day moving average of daily new cases in 35 countries on day 135.

An example of the practical application of our doubling time indicators is an ongoing evaluation of the performance of an epidemic preparedness index published in just a year before the COVID-19 pandemic.17 The index is constructed from a range of country characteristics some of which we would describe as contextual (eg, hospital capacity) and others of which reflect policies directly relevant to pandemic preparedness. The team at Metabiota that developed this epidemic preparedness index is assessing the extent to which the index predicts how well countries are responding to COVID-19. We provided Metabiota with this paper’s performance rankings and doubling time estimates to use as dependent variables in their retrospective evaluation. Metabiota’s analysis is just beginning and its results will be reported in due course. It will then provide an extended example of and insight into the use of our mid-pandemic doubling time metrics. That said, the Metabiota team is now finding a positive correlation between our performance rankings and country rankings on the epidemic preparedness index, with better prepared countries exhibiting longer doubling times. Preliminary analysis also identified important outliers, both positive (notably China) and negative (notably the USA),18 which indicates that actual policies adopted also matter. Analysis of these outliers may help inform future studies of the determinants of performance and potentially identify how specific elements of preparedness affect (or do not affect) country performance.

The broad spread in performance across countries could potentially provide a basis for improved political accountability, for identifying good practices and for the purpose of understanding determinants. Along with its 30 January proclamation of a public health emergency of international concern, WHO conveyed an assessment that specific, timely and well-understood control measures are likely to interrupt transmission.19 Successful country examples of high and increasing doubling times confirm the plausibility of this assessment. Performance differences may result from many factors, including delay in implementation of response, initial preparedness, as for example, reported by Metabiota, stringency of national response as reported by the Blavatnik School at Oxford,20 or a range of non-policy contextual factors, such as the age distribution of the population.

Our purpose in this study was to provide mid-pandemic country rankings to facilitate subsequent analysis of mid-pandemic performance and its longer-term consequences. The extent to which the rankings will achieve these objectives remains to be verified.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Handling editor Seye Abimbola

  • Contributors All authors contributed to the concept and development of this paper, to the data analysis and to review of the final version. All authors approved the final version.

  • Funding DTJ was supported in part by a Norwegian Agency for Development Cooperation (NORAD) grant to the Bergen Centre for Ethics and Priority Setting, University of Bergen, Norway. The Chinese University of Hong Kong paid the open access fee for publication.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available in a public, open access repository. Data are available upon request. All data relevant to the study are included in the article or uploaded as supplementary information. Time series data on countries from http//www.worldometers.info. Population data from World Development database, World Bank, 23 December 2019. China data from China’s National Health Commission’s 'Daily Briefing on Novel Coronavirus Cases in China' (http://en.nhc.gov.cn).

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.