Introduction
Structured evaluation of a country’s ability to respond to health security threats has garnered a great deal of attention and effort in the last 2 years with implementation of the Joint External Evaluation (JEE) system.1 At the time of this writing, 95 countries had engaged in the full JEE process involving a national self-study followed by a 5 day, on the ground review involving international experts.2 JEEs are intended to provide a thorough review and evaluation of a country’s capacities in 19 key areas of public health.3 The scores for each of 49 indicators in 19 domains is measured on a five-point scale combining quantitative and qualitative characteristics. The accompanying narratives summarise major strengths and limitations in each country’s public health systems, and recommendations for improvements are made.
The JEE scores are subsequently published as part of the JEE report and publicly available on several websites. Much effort has gone into developing and carrying out the JEEs; little validation of the JEE scores and recommendations are available to date.
We review a disease outbreak in each of three countries having undergone the JEE process. We compare scores and recommendations from the JEEs to conditions identified during postoutbreak reviews affecting that country’s response to the outbreak. Such a comparison provides a field-based validation in outbreak-related areas of the scores and recommendations from the JEE.
Scores and recommendations from each country’s JEE were drawn from the JEE summary documents published on the WHO’s website.4 Information on each outbreak was collected using a combination of sources and methods, including the following:
Documented first and last reports from US Centers for Disease Control and prevention (CDC)’s Global Disease Detection Operations Center;
On-line media reports, UN Situation Reports and journal articles;
Interviews with staff following each outbreak at the CDC Operations Center;
Interviews with international and national responders during the outbreaks. These responders included staff from CDC, other international agencies, and national Ministries of Health. Questions specific to that outbreak were elaborated for these interviews on the bases of the above sources. In some cases, follow-up questions were posed to these informants in an iterative process to probe further prior responses and in triangulating information from the various sources.
Finally, preliminary conclusions were shared with field and headquarter staff to further refine an understanding of the collected responses.
Correspondence between the strengths and limitations in national systems relevant to an outbreak are summarised from review of text in the JEE document. Summarised information on each outbreak was compared with the relevant country’s JEE scores and text in these topical areas. The topical areas of relevance included IHR Coordination, National Laboratory Systems, Surveillance, Public Health Workforce, Preparedness, Emergency Operations, Medical Countermeasures and Risk Communication.
A subjective assessment of similarity and difference between these two sets of information was created independently as judged by each of the three authors on a three-level scale. Each of the authors is involved in global health security work professionally and has taken part in JEEs and postoutbreak reviews, though not in the countries evaluated. The three reviewers did not consult in creating their agreement scores. High correspondence existed if both the JEE and description of an outbreak raised a common concern. For example, if both described highly effective systems for laboratory diagnosis a ‘high’ level of correspondence was recorded. If in the outbreak, instead, an inadequate response from the laboratory system was reported, ‘low’ correspondence was recorded. Similarly, if the JEE reported poor surveillance capacity, and surveillance during the outbreak was considered poor, a ‘high’ correspondence was reported.
A Kappa statistic was generated to identify how likely the level of agreement among raters could have happened by chance. The MAGREE macro in SAS was used as it is a multiple-rater kappa statistic which omits missing values.