Article Text

What are SARS-CoV-2 genomes from the WHO Africa region member states telling us?
  1. Lu Lu1,
  2. Samantha Lycett2,
  3. Jordan Ashworth1,
  4. Francisca Mutapi3,4,
  5. Mark Woolhouse1,4
  1. 1Usher Institute, Ashworth Laboratories, Kings Buildings, The University of Edinburgh, Edinburgh, UK
  2. 2The University of Edinburgh The Roslin Institute, Roslin, Midlothian, UK
  3. 3Institute of Immunology & Infection Research, The University of Edinburgh School of Biological Sciences, Edinburgh, UK
  4. 4NIHR Global Health Research Unit Tackling Infections to Benefit Africa (TIBA), The University of Edinburgh, Edinburgh, UK
  1. Correspondence to Francisca Mutapi; f.mutapi{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Summary box

  • To date, 23 countries from the WHO Africa region have deposited a total of 3995 SARS-CoV-2 sequences to publicly available databases, with the majority of the genomes being from South Africa (56%).

  • Eight-four different lineages have been identified from these countries with 86% genomes belonging to the B.1 sublineage and its descendants.

  • There have been multiple separate introductions of SARS-CoV-2 infections into Africa, with approximately 43% of these coming from Europe.

  • 95% of African SARS-CoV-2 genomes have the D614G mutation in spike protein thought to be associated with higher infectivity but lower disease severity.


Pathogen genome sequencing can inform control measures including diagnosis, identifying infection sources and patterns of infection and disease prognosis. The first case of SARS-CoV-2 was reported on 31 December 2019, and on 10 January 2020 the first sequences of the virus were made publicly available allowing the development and standardisation of the real-time PCR diagnostics for the virus. The reagents for the PCR diagnostic test were designed using the full spectrum of the SARS-CoV-2 reference genome collected on 5 January 2020, in Wuhan.1 A recent analysis has shown that the majority of current PCR diagnostic targets have undergone mutations with the nucleocapsid (N) gene primers and probes having undergone the most mutations.2

In Africa, analysing virus genome sequences revealed transmission patterns within individual countries. For example, analyses of sequences in Kenya during the early phase of the epidemic identified both imported and local community transmission, demonstrating transmission from Nairobi to the coastal regions.3 Similarly, South Africa was able to identify nosocomial transmission using genome sequencing.4

The course of infection and disease prognosis can vary depending on the virulence of the pathogen. In addition, viruses may be able to adapt to the environment and evade recognition by the immune system, so it is crucial to understand the molecular basis for any variation. Several studies have already reported some heterogeneity in the circulating strains of SARS-CoV-2.5 However, the important aspect is to relate molecular differences with phenotypic traits that have health implications. For example, a recent study in Singapore comparing the clinical and immunological indicators in patients infected with the wild-type SARS-CoV-2 and those infected with the Δ382 variant of SARS-CoV-2 which has a deletion in its ORF8 of 382-nucleotide showed that the ∆382 variant of SARS-CoV-2 was associated with a milder infection.6

The state of genome sequences available publicly from WHO AFRO countries

Overall, there has been a much-increased global effort and speed in pathogen sequencing and availing the data in public depositories during this SARS CoV-2 pandemic. Average lag between sample collection and sequence availability has been estimated to be 3 months. We have calculated that for other previous human infective viruses, there was an average of over two and a half years, between sample collection and public sequence availability in 2015. In 2019, this had reduced to an average of about 1 year.

During the SARS-CoV-2 pandemic, several African countries have published in-country sequences faster than the average 3 months. For example, 3 days after the confirmation of Nigeria’s first COVID-19 case, the genome sequencing results of the SARS-CoV-2 specimen from the country were announced on 1 March 2020.7 However, to date, there have been no published analyses of SARS-CoV-2 genomes at a continent level in Africa to summarise the patterns on the continent. We have therefore conducted an analysis of all the publicly available sequences from the WHO Africa Region member countries to summarise the currently available information from SARS-CoV-2 genomes.

Data comprising all SARS-CoV-2 genome sequences and the corresponding metadata from WHO Africa region (AFRO) on the GISAID sequence data repository were downloaded on 30 November 2020 ( Spatiotemporal phylogenetic analyses were conducted using the complete genomes available at the time, from African countries and other continents. Over representative sequences were subsampled. The number of introductions to WHO AFRO member countries were estimated by inferring the ancestral states on the phylogenetic tree using the likelihood reconstruction method.

As of 30 November 2020, 3995 SARS-CoV-2 genomes had been submitted to GISAID from 23 countries in the WHO Africa Region (out of 47). These represented ~2% of publicly available sequences globally. The majority of genomes were from South Africa (56%), Democratic Republic of the Congo (9%) and Gambia (9%) (figure 1A,B).

Figure 1

Summary of sequence data for WHO AFRO countries. (A) Heat map of SARS-CoV-2 genomes available on GISAID by African country. (B) Comparison of cumulative reported cases and number of sequences available per country, with the variables on the axes, number of sequences and cumulative cases log transformed. Different colours represent different countries. (C) Global lineages in African countries (colours present different African countries as in B) compared with other regions in the world (in grey). (D) Genomes from individual countries highlighted on approximate time-scaled global subsampled tree (n=3266), with tips coloured by the sampling locations as per country-colour key in (C). Lineage B.1 is indicated on the node. (E) SARS-CoV-2 genomes available on GISAID by collection date. Colours for WHO AFRO countries are consistent in (B) to (E), other world regions are in grey.

SARS Cov-2 in WHO Africa countries

Eighty-four lineages were identified in the sequences from WHO AFRO countries. Currently, 266 lineages (including A, B, C and D lineages and all their descendent sublineages) that have so far contributed to active spread globally have been identified by the PANGOLIN phylogenetic classification.8 Knowledge of circulating lineages is important for multiple reasons, including determining potential efficacy of vaccines, as has been demonstrated by the challenges for vaccine development for other viruses such as HIV-1, influenza or Dengue driven by viral diversity. A study analysing 18 514 SARS-CoV-2 sequences sampled since December 2019 concluded that although there are some reported differences in SARS-CoV-2 genomes, viral diversity is not sufficient to reduce the efficacy of vaccines currently under development.9

As of 30 November, the most prevalent lineage globally was B.1 (figure 1C, D). Our analyses show that 86% of all African SARS-CoV-2 genomes belong to this B.1 lineage which represents a large European lineage that corresponds to the Italian outbreak in February, and its descendant sublineages. In South Africa, B.1.106 was identified in April 2020 as the dominant sublineage, but reports now indicate that this lineage became extinct after the effective control of nosocomial outbreaks.10 These include the novel C.1 lineage, which has 16 mutations compared with the original Wuhan sequence. The C.1 lineage has been reported as the most geographically widespread lineage in South Africa.10 The implications of these changes, if any, in terms of transmission and disease outcome are still to be determined.

Our analyses indicate that there have been multiple separate introductions of SARS-CoV-2 into Africa. We estimated that based on currently available sequences, there have been at least 211 separate introductions of the virus into WHO AFRO countries from other continents and African countries outside the WHO AFRO region (figure 1D). Of these, 43% were introduced from Europe, supporting the epidemiological case tracing findings. Most of these introductions do not appear to have spread between countries in Africa. However, 80 introductions from within Africa were tentatively identified, but the small number of sequences currently available from the African countries means that directionality of these introductions cannot as yet, be determined.

What still needs to be done

There is still need for sequencing of the virus in other African countries and for the sequences to be deposited in publicly available data bases. Our analyses indicate that we are still missing sequences from 24 WHO AFRO countries and, furthermore, some of the countries with sequence data have submitted a total of less than 10 sequences to date. The launch of a network of laboratories in September 2020 to reinforce genome sequencing of SARS-CoV-2 in Africa by the WHO World and the Africa Centres for Disease Control and Prevention (Africa CDC) is a welcome development.11 This strengthens the capacity of the member states already provided by other partners on the continent including the African Center of Excellence for Genomics of Infectious Diseases (ACEGID), Nigerian Center for Disease Control (Nigeria CDC) and Tackling Infections to Benefit Africa (TIBA).12

There is also need to ensure that the sequencing is science-led or purpose-led so that limited resources are not wasted on sequencing for the sake of sequencing. Finally, there is need to ensure that supporting data are collected and well curated so that patient meta-data including clinical, immunological and prognosis can easily be related to the sequence data. This will ensure that the impact of molecular changes on phenotypic attributes can be more readily identified. For example, when SARS-CoV-2 emerged in Wuhan, it had a D residue (aspartate) at position 614 in the sequence of the virus’s spike protein. By June, the D residue had been replaced by G (glycine) on most continents. This mutation is thought to be associated with higher infectivity but lower disease severity.6 13 Our analyses have shown that 94% of African SARS-CoV-2 genomes have the D614G mutation in spike protein.


For a global pandemic, with potential global solutions, there is need for comprehensive levels of molecular information from SARS-CoV-2 viruses circulating throughout all the continents. It is hoped that the low number of sequences submitted from Africa (figure 1E) will be accelerated to generate the much-needed sequence information from the continent.

Supplemental material


We gratefully acknowledge all SARS-CoV-2 genomes authors, originating and laboratories in Africa submitting data to the GISAID’s EpiCov Database on which this study was based, the complete list is in the online supplemental table 1). All submitters of data may be contacted directly via the GISAID website (


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Handling editor Seye Abimbola

  • Twitter @_jordanash, @PIG_Edinburgh

  • Contributors LL, SL and JA conducted the data analysis and were involved in the manuscript preparation. FM collated the information and lead the manuscript preparation. MW conceived and supervised the work and was involved in the manuscript preparation.

  • Funding This work was commissioned by the National Institute for Health Research (NIHR) Global Health Research programme (16/136/33) using UK aid from the UK Government. The study also received funding from the Scottish Funding Council GCRF COVID-19 grant at the University of Edinburgh.

  • Disclaimer The views expressed in this publication are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.

  • Map disclaimer The depiction of boundaries on this map does not imply the expression of any opinion whatsoever on the part of BMJ (or any member of its group) concerning the legal status of any country, territory, jurisdiction or area or of its authorities. This map is provided without any warranty of any kind, either express or implied.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.