Main

A large-scale Ebola viral disease (EVD) outbreak has been ongoing in Western Africa for nearly a year, with more than 23,000 reported cases1. Previous findings have shown that the causative agent is a novel Ebola virus (EBOV)2. Among the three West African countries with widespread and intense EBOV transmission, Sierra Leone reported the largest number of confirmed cases, approximately 58% of the total confirmed EBOV infection cases. To help Sierra Leone fight against EVD, the Chinese government dispatched the China Mobile Laboratory Testing Team (CMLTT) in September upon request of the Sierra Leone government. The CMLTT, equipped with medical experts who specialize in laboratory testing, epidemiology, and running a holding and treatment centre, has kept working at the Sierra Leone-China Friendship Hospital at Jui Town (represented as a red star in Fig. 1a) of Western Area, approximately 30 km southeast of Freetown, the capital city of Sierra Leone. All the activities of the CMLTT were coordinated by the Emergency Operations Center jointly established by the Ministry of Health and Sanitation of Sierra Leone and the World Health Organization (WHO).

Figure 1: Geographical distribution and phylogenetic analysis of the 2014 EBOV from Sierra Leone.
figure 1

a, Geographical distribution of the 823 EBOV positive samples and the 175 newly sequenced genomes (represented as blue dots). In the panel, main roads and waterways are showed as yellow lines and black dash lines, respectively. b, A Bayesian phylogenetic tree of the 2014 EBOV. The 175 newly sequenced viruses in this study are shown in colours, and others are shown in grey. The seven novel lineages designated in the present are highlighted. Posterior support for major nodes is shown.

PowerPoint slide

To fight against this novel EBOV, Gire and colleagues systematically analysed 81 EBOV genomes from Guinea (n = 3)2 and Sierra Leone (n = 78)4 collected from the early stage of the 2014 EBOV outbreak, revealing the origin, transmission, and rapid accumulation of genetic variation of the 2014 EBOV. However, only a few additional full-length EBOV genome sequences were published since July 2014, when the outbreak entered a rapid growth phase driven by sustained human-to-human transmission4. From 28 September to 11 November 2014, a total of 823 samples were tested to be EBOV-positive using reverse transcription-PCR (RT–PCR) by the CMLTT, among which 175 full-length genomes were successfully sequenced with each from an individual EVD patient (Fig. 1a and Supplementary Table 1). These 175 samples were obtained from five severely stricken districts in Sierra Leone, including 47 from Western Urban, 67 from Western Rural, 47 from Port Loko, 5 from Kambia, and 9 from Bombali (Fig. 1a). In detail, approximately one fifth of the EBOV-positive samples for each region were sequenced, 19.5% for Western Urban, 21.2% for Western Rural, 22.1% for Port Loko, and 16.1% for Kambia. Regarding Bombali, 9 out of 17 (52.9%) strains were sequenced. Therefore, our sequenced genomes were roughly proportional to the prevalence in different regions.

Phylogenetic analysis of all available full-length EBOV genome sequences from Sierra Leone (n = 253) and Guinea (n = 3) from 2014 was performed using MrBayes8 in which the three Guinean strains were designated as root4,9. Our phylogenetic analysis showed that the 2014 EBOV increased in diversity at least through October after its initial introduction into Sierra Leone (Fig. 1b and Extended Data Fig. 1). Apart from the previously described lineages SL1 and SL24, the SL3 lineage has evolved into two major lineages, SL3.1 and SL3.2 in June in eastern Sierra Leone, both of which were then transmitted to western Sierra Leone. The majority of the EBOV collected from late September to mid-November fell into lineage SL3.2, with a few belonging to lineage SL3.1. However, none of them belonged to lineages SL1 and SL2. In particular, the EBOV sequenced by us could be classified into seven novel independent sublineages based on the phylogenetic topology, two sublineages belonging to SL3.1 (SL3.1.1 and SL3.1.2) and five belonging to SL3.2 (SL3.2.1 to SL3.2.5) (Fig. 1b). Phylogenetic tree constructed using the maximum likelihood method showed a similar topology (Extended Data Fig. 2). Therefore, the 2014 EBOV has become highly diverse in its first year along with its spread in Sierra Leone.

To explore the spatiotemporal relationships of the EBOV in western Sierra Leone, we performed a phylogeographic analysis using BEAST10 (Fig. 2 and Extended Data Fig. 3). In this analysis, only 22 out of the 78 sequences previously published by Gire et al. (ref. 4) were included in our analysis to reduce the computation load. To this end, we selected representative sequences from the previously described lineages GIN, SL1, SL2 and SL3, ensuring that there is at least one sequence for every sampling date. From a time point of view, all of the novel sublineages probably emerged before August (Fig. 2). In addition, multiple lineages were co-circulating in a single town/district. All of the seven sublineages were identified in Waterloo, indicating the highest phylogenetic diversity in this region. Viruses from Freetown belonged to six of the seven sublineages, with sublineage 3.2.3 undetected. Five novel sublineages have also been found in Maforki Chiefdom of Port Loko.

Figure 2: Phylogeographic reconstruction of the 2014 EBOV using BEAST.
figure 2

In the left panel, the novel 175 EBOV genome sequences were coloured by geographic regions. The transition of different colours represents a potential transmission event. In the right panel, the number of sequences from different geographic regions in each lineage is summarized.

PowerPoint slide

The spatiotemporal linkage of our sequenced EBOV genomes is further shown in Fig. 3a. First, viruses from Freetown and Waterloo, the capital and the traffic hub, are estimated to be spatiotemporally related, as also observed in sublineages 3.1.1, 3.1.2 and 3.2.4 (Fig. 2), indicating that frequent transmission events might have occurred between the two regions. Second, this network reveals that viral transmission events have also occurred between the three major sites (Freetown, Waterloo and Maforki Chiefdom) and their surrounding regions. Third, our results also suggest spatiotemporal connections of EBOV between Waterloo and Port Loko, Kambia, and Bombali, respectively, as exemplified in sublineages 3.2.1 and 3.2.5 (Fig. 2). Based on the higher transmission rates of Waterloo, Freetown and Maforki Chiefdom, intensive EBOV surveillance in the three regions should be helpful for the prevention and control of the EVD outbreak in Western Sierra Leone.

Figure 3: Reconstructed phylogeographic linkage, substitution rate, and effective population size of the 2014 EBOV in western Sierra Leone from September to November 2014.
figure 3

a, The phylogeographic linkage constructed using BEAST. Thickness of lines represents the relative transmission rate between two regions. The size of each node is proportional to the sum of the relative rates of the region with Bayes factor >3. b, Substitution rates of the 2014 EBOV. The red line represents the substitution rate estimated using all the 2014 EBOV samples. Estimations of Gire and colleagues were repeated by us and shown as the blue line. c, Gaussian Markov random field Bayesian skyride reconstruction of the 2014 EBOV. Bar chart shows the numbers of confirmed cases of EBOV infection and patients. Smooth black line shows the effective population size. Adapted, with permission, from Ebola response roadmap - Situation report, Figure 3; http://www.who.int/csr/disease/ebola/situation-reports/en (accessed 1 April 2015)1.

PowerPoint slide

The substitution rate for all of the 2014 EBOV was estimated using BEAST to be 1.23 × 10−3 substitutions per site per year (95% highest posterior density interval, 1.04 × 10−3 to 1.41 × 10−3 substitutions per site per year) (Fig. 3b). Our estimate was similar to those between previous EBOV outbreaks, approximately 1.00 × 10−3 substitutions per site per year4,11,12,13,14. This suggests that, over a longer time interval, EBOV is still undergoing evolution at a relatively constant rate.

The estimated population size of the 2014 EBOV from Sierra Leone steadily increased from July to early October, and then entered a plateau period (Fig. 3c). This therefore implies that the effective population size of the 2014 EBOV became stable in October, which was also broadly consistent with the weekly change of numbers of confirmed EBOV infection cases and EVD patients in Sierra Leone (Fig. 3c)1. The doubling time estimated using BEAST was 22.1 days (95% confidence interval, 18.9–25.59 days), which was comparable to that calculated using the epidemiological data from Sierra Leone, with the mean value of 18.9 days.

We then investigated the molecular characterization of the novel EBOV genome. Raw reads of each genome were mapped to the reference genome (KJ660346.2). The average normalized coverage was approximately 1,400-fold (Fig. 4a). 341 single nucleotide polymorphisms (SNPs) have been previously identified between the 2014 outbreak EBOV and previous EBOV4, and 440 SNPs were identified in our sequenced genomes. The substitutions in the 175 newly sequenced EBOV genomes were summarized among different lineages (Fig. 4b and Supplementary Table 2). Approximately a quarter of the identified substitutions were non-synonymous, and half of them were synonymous (Extended Data Fig. 5). Some of the SNPs were lineage-specific and could be used as markers to distinguish different lineages (Fig. 4b and Supplementary Table 2). For example, substitutions A7148G and A17445G were only found in sublineage 3.1.2, whereas sublineage 3.2.4 possessed a specific T5849C substitution. The T > C substitutions that occurred in the 3′ UTR region of NP gene (at genome positions 3008 and 3011) were specific to sublineage 3.2.5. In particular, the T > C substitution at position 14019 occurred in all sequences of lineage 3.2, which was first described in this study. Moreover, seven previously reported substitutions (at positions 800, 1849, 6283, 8928, 10218, 15963, 17142)4 were always present in the novel lineages from June to November 2014 and became the dominant allele in the population, suggesting that they have been fixed. These substitutions included two non-synonymous substitutions (C800T in the NP gene and C6283T in the GP gene), four synonymous substitutions, and one in the non-coding regions.

Figure 4: Genomic variations of the 2014 EBOV.
figure 4

a, Sequence depth across sequenced genomes. The x axis represents the virus genome structure, and the y axis represents the normalized average depth. One unit equals approximately 1,400 coverage per site. The mean depth is shown using the red line and the standard deviation is shown in shade. b, Substitutions of the 2014 EBOV. Only positions with substitutions are shown. Different lineages are separated by lines. Different types of substitutions are indicated using different colours: cyan for synonymous (S), magenta for non‐synonymous (NS), green for un‐translated regions (UTR), and grey for intergenic regions (IG). c, All the serial T > C substitutions are found within a range less than 150 bp. Substitutions within coding regions are shown in codons.

PowerPoint slide

Interestingly, we observed several serial T > C substitutions in six newly sequenced EBOV genomes, which occurred within a genome region of 150 base pairs in length (Fig. 4c and Extended Data Fig. 4). The serial T > C substitutions were further confirmed by Sanger sequencing after PCR amplification (Extended Data Table 1). Such serial substitutions were found in four different regions of six strains belonging to three different lineages, two of which were in coding regions and the other two were in non-coding regions. However, the emergence mechanism of such serial T > C substitutions and their potential biological functions warrant further investigation.

In summary, our findings highlighted the increasing genetic diversity and transmission dynamics of the 2014 EBOV, with an evolutionary rate estimated to be similar to that between previous EBOV outbreaks. This information provided an insight into the viral evolution and transmission dynamics, which would facilitate the prevention and control of EBOV in Sierra Leone and would also guide research on vaccines and therapeutic targets.

Methods

Ethics statement

This work was conducted as part of the surveillance and public health response to contain the EVD outbreak in Sierra Leone. Blood samples from suspected individuals and oropharyngeal swab samples from corpses were collected for EVD testing and outbreak surveillance with a waiver to provide a written informed consent during the EVD outbreak under the agreement between the Sierra Leone government and Chinese government. The activities were coordinated by the Emergency Operations Centre in the charge of Sierra Leone Ministry of Health and Sanitation and WHO. All the information regarding individual persons has been anonymized in the report.

Genome sequencing and assembly

RNA samples extracted from whole blood from 175 EVD patients were reverse transcribed to cDNA. PCR amplifications were performed with EBOV-specific primer pairs with overlaps. Amplicons from one patient were pooled for library preparation. Next generation sequencing (NGS) was performed using the BGISEQ-100 (Ion Proton) platform. All the sequenced reads were filtered to remove the low quality and short reads. The genome sequences of the viruses were assembled by mapping the filtered reads to the 2014 EBOV consensus sequence using Roche 454 Newbler version 2.9 (Roche), and the mutation site was manually checked with original sequencing data.

Phylogenetic and phylogeographic reconstruction

All previously published EBOV genome sequences and our newly released 175 sequences were aligned using MAFFT v7.05815. Phylogenetic analyses were performed using MrBayes8 v3.2 (10 million generations) and RAxML v8.1.6 (1000 bootstrap replicates), with the GTR model of nucleotide substitution and γ-distributed rates among sites. Phylogeographic reconstruction of the 2014 EBOV was estimated using BEAST v1.8.010, with a continuous time Markov Chain (CTMC) over discrete sampling locations. The 175 newly sequenced samples in this paper were grouped into 7 regions (Waterloo, Freetown, Rest of Western, Maforki Chiefdom, Rest of Port Loko, Bombali and Kambia). Bayesian Markov chain Monte Carlo analysis was run for 100 million steps, 10% of which were removed as burn-in and sampled every 10,000 steps. Bayes factor tests were performed to provide statistical support for potential transmission routes between different geographic locations using SPREAD v1.0.616. Bayes factors for rates were derived from a Bayesian stochastic search variable selection procedure. The phylogeographic linkage was constructed by routes with Bayes factor values >3.

Substitution rates and population dynamics

The substitution rates were estimated using Bayesian Markov chain Monte Carlo (MCMC) as implemented in BEAST v1.8.0. In this analysis, two data sets were compiled, with one including all the 2014 EBOV sequences and the other including sequences from September to November, 2014. We performed two independent runs for 100 million generations, sampling every 10,000 steps. In addition, to accurately estimate the substitution rate, we repeated this analysis using a previously described data set using the same parameters. Population dynamics of the 2014 EBOV in Sierra Leone was estimated using a flexible non-parametric Bayesian skyride model17 incorporated in BEAST v1.8.0, with the HKY+Γ model and a strict molecular clock.

Molecular characterizations of the 2014 EBOV

SNPs were called directly from the sequence alignment using the CLC Genomic Workbench v7.5.1, GeneiousR8 and Newbler v2.9. The earliest strain of EBOV 2014, H.sapiens-wt/GIN/2014/Makona-Kissidougou-C15 (GenBank accession number KJ660346.2) was used as the reference genome. The synonymous substitutions, non-synonymous substitutions, and substitutions in non-coding regions were marked with coloured dots.