Two years after the start of the COVID-19 pandemic, key questions about the emergence of its aetiological agent (SARS-CoV-2) remain a matter of considerable debate. Identifying when SARS-CoV-2 began spreading among people is one of those questions. Although the current canonically accepted timeline hypothesises viral emergence in Wuhan, China, in November or December 2019, a growing body of diverse studies provides evidence that the virus may have been spreading worldwide weeks, or even months, prior to that time. However, the hypothesis of earlier SARS-CoV-2 circulation is often dismissed with prejudicial scepticism and experimental studies pointing to early origins are frequently and speculatively attributed to false-positive tests. In this paper, we critically review current evidence that SARS-CoV-2 had been circulating prior to December of 2019, and emphasise how, despite some scientific limitations, this hypothesis should no longer be ignored and considered sufficient to warrant further larger-scale studies to determine its veracity.
- public health
Data availability statement
There are no data in this work.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
A growing body of studies provides evidence for the global circulation of SARS-CoV-2 prior to December 2019, contradicting the currently hypothesised timeline of the original viral emergence in Hubei province of China around November 2019; however, any suggestion of an earlier SARS-CoV-2 circulation is met with scepticism.
Several studies performed independently by different groups retrospectively demonstrated the presence of antibodies and viral RNA in clinical samples and showed SARS-CoV-2 community circulation by detecting viral RNA in wastewater at times inconsistent with November 2019 emergence.
Despite some limitations, combining the knowledge acquired from these studies is sufficient to warrant further larger-scale investigations to determine the veracity of this hypothesis.
If proven true, an earlier than currently believed worldwide spread of SARS-CoV-2 will provide essential clues for understanding the genesis of this pandemic and offer invaluable lessons from our successes and failures with crucial implications for future pandemic preparedness and global health.
Tomorrow, when I wake, or think I do, what shall I say of today? –Waiting for Godot, Samuel Beckett.
During the spring of 2020, the sudden outburst of one of the most devastating pandemics of modern times changed our life, possibly forever. Almost 2 years after the beginning of this transition into the ‘new normal’, with enormous daily case and death counts, tight disease surveillance and lockdowns, there are still many unresolved questions about how this could have happened. Where did the virus come from? When, where and how did the virus jump the species barrier? When did the virus become highly capable of infecting humans? Did the worldwide respiratory and general disease surveillance systems allow the virus to spread undetected? If history is an indication, some of these questions may remain unanswered for a long time to come.
Late in December 2019, Chinese clinicians began reporting that a pneumonia of unknown origin had affected people in Wuhan, Hubei, and on 31 December, the WHO became aware of the situation after identifying a ProMED post and online reports about the outbreak.1 The WHO acted quickly, and on 5 January 2020, informed all its Member States about the outbreak with the general reminder that ‘public health measures and surveillance of influenza and severe acute respiratory infections still apply’.2 Quite rapidly, the genome of a novel coronavirus, later named SARS-CoV-2, was sequenced and determined to be the causative agent. Subsequently, health authorities realised that this new agent was capable of human-to-human transmission. As a result, the first border control measures for screening air passengers coming from Wuhan were implemented in many countries. On 30 January 2020, as recommended by its Emergency Committee, the WHO declared the outbreak ‘Public Health Emergency of International Concern’ and alerted countries to be prepared for the deployment of emergency measures such as containment, active surveillance and contact tracing.1 However, by then, the virus had already been spreading worldwide3 and by 27 February 2020, cases of COVID-19 had already been reported in 53 countries, prompting the WHO on 11 March to characterise the disease as a global pandemic.1
The origin of SARS-CoV-2 is still hotly debated and there is no dispositive proof of whether the virus started its spread after a single or multiple zoonotic events or if the virus ‘escaped’ from a research laboratory through accidental exposure or breach of safety protocols. So far, zoonotic emergence is considered the most likely option by some scientists.4 Some of the first COVID-19 cases in Wuhan were, in fact, linked to a seafood market that sold a wide array of live animals, and similar zoonotic origins have already been proven for other human coronaviruses.5 However, targeted scientific investigations concluded that the market might have simply acted as an amplification site.1 Additionally, although close relatives of SARS-CoV-2 have been detected in animals, with bats harbouring the closest relatives to this virus, genomes sufficiently close to early SARS-CoV-2 isolates to act as zoonotic sources have not yet been identified in any species, making it difficult to pinpoint any potential intermediate hosts.1 4 6 7 Since genomically closest bat coronaviruses have been identified thousands of kilometres away from Wuhan and no SARS-CoV-2-infected animals have been linked to the wet markets in Wuhan, the question of how the progenitor virus arrived in Wuhan remains unanswered. Therefore, to clarify the mechanisms at the origin of this virus, identifying more accurately when and where the first human cases occurred is crucial to determine the original infection source(s).
Evidence of earliest SARS-CoV-2 infections
Although considerable uncertainty around the early phases of the epidemic in Wuhan remains, the earliest patient with laboratory-confirmed COVID-19 reported in literature developed symptoms on 1 December 2019.8 However, several literature reports provide clues for virus circulation during that time and earlier in other parts of the world9 (table 1). In the USA, SARS-CoV-2 reactive antibodies were detected in over 100 blood samples collected in several different states in early December 2019.10 In Brazil, environmental surveillance monitoring demonstrated early SARS-CoV-2 community spread at the end of November 2019 by detecting viral RNA in wastewater.11 A study performed in the UK identified a few blood donors in May 2019 whose sera presented SARS-CoV-2 S-reactive antibodies associated with a presumed current immune response.12 In France, antibodies against SARS-CoV-2 were found in serum samples collected in November 2019, and viral RNA was detected in December 2019 in a respiratory sample from a patient hospitalised for haemoptysis.13 14 However, most of the studies that investigated and found evidence for early SARS-CoV-2 circulation were performed in Italy, the first European country reporting sustained community transmission.
The first official and confirmed non-travel-related Italian COVID-19 case was identified on 20 February 2020, in Codogno, a small town located in Lombardy, the most populous region of Italy. Italy quickly became the epicentre of the COVID-19 European epidemic and by March 2020, it surpassed China in the number of officially reported cases. Within Italy, Lombardy was the most affected region during the first pandemic wave.1 15–18 During March–May 2020, Italy registered a 31.7% increase in all-cause deaths compared with the same period in the quinquennia 2015–2019 (excess deaths due to all causes), an increase to which the heaviest contributors were the regions from the north (+61.1%) and especially Lombardy, which experienced 111.8% increase.18 Lombardy was the first area of the Western World to be severely affected by the pandemic and one of the regions that suffered its heaviest consequences. These reasons are likely why many research groups were induced to retrospectively investigate the initial phases of the COVID-19 outbreak in Italy.
From the very beginning, it was clear that sustained viral spread started in Lombardy weeks before the first detection of the virus in the Codogno case. Indeed, the estimated net reproduction number had been above the epidemic threshold since late January 2020 and contact tracing demonstrated ongoing transmission throughout January 2020.19 20 A study performed later by the Italian National Institute of Health demonstrated that the virus was circulating in Milan (Lombardy) and Turin (Piedmont) by the end of 2019 in sufficient numbers to allow the detection of viral RNA in sewage water samples collected on 18 December 2019.21 Finally, several studies performed independently by different groups demonstrated the presence of antibodies against SARS-CoV-2 in blood samples since September 201922–24 and of viral RNA in clinical samples collected from patients with cutaneous manifestations (dermatosis and measles-like rash) as early as 12 September 2019.24–26
Criticisms and supporting facts
The spread of SARS-CoV-2 outside of China prior to December 2019 directly contradicts the currently hypothesised timeline of the original viral emergence in China’s Hubei province around November 20193 27 28 and its subsequent introduction in Europe and North America in January 2020.29 30 However, sequence-based relaxed molecular clock methods used to date ancestral nodes in phylogenetic trees are influenced by sequence availability and temporal signal present in the data. The addition of even a few early strains, which may have gone undetected at the beginning, can significantly shift the relative likelihoods of these predictions.24 31 More recent computational analyses including a very large number of complete genomes have moved the time of viral emergence to well before the major Wuhan outbreak, up to the summer of 2019.32–34
Because of these disagreements, laboratory evidence for early circulation is often dismissed and labelled as a result of false-positive testing. Antibody detection results can indeed be affected by the presence in sera of antibodies which, although able to recognise SARS-CoV-2 antigens, were induced by other agents.12 However, the presence of SARS-CoV-2 neutralising activity in these sera and the fact that several patients presented more than one class of antibodies recognising SARS-CoV-2 suggest that, although some cross-reactivity should be taken into account, at least some of the sera could contain antibodies induced by a prior SARS-CoV-2 infection.10 12 22–24
Direct evidence for early viral circulation can be obtained by detecting SARS-CoV-2 RNA. Techniques used for this purpose, especially if accompanied by sequencing, are more specific and less influenced by previous infections due to other pathogens. However, PCR-based methods are highly sensitive and, therefore, more prone to false-positive results. Various methods have been used to detect SARS-CoV-2 RNA in 2019 samples collected outside of China, including RNA fluorescence in-situ hybridisation,26 RT-qPCR11 14 21 and nested-PCR,21 24 25 and most of the results obtained were confirmed by sequencing.11 21 24 25 Unfortunately, none of these studies could recover complete genomic sequences needed for phylogenetic and coalescence analyses. This is likely a problem tracing back to the unsuitability of the available samples, which did not contain high enough concentrations of high-quality RNA. Even in clinical samples found to be RNA positive, viral load was low, implying that only a few viral templates would be available for amplification. For instance, the positive oropharyngeal and urine samples were collected within the framework of measles surveillance from patients who developed a skin rash,24 25 which may appear late during infection, when respiratory symptoms have been resolved, or even before the onset of other COVID-19 symptoms,35 36 resulting in low viral loads in swabs collected at the moment of the rash onset. Interestingly, one study detected multiple positive patients, some of whom also presented SARS-CoV-2-recognising antibodies, and observed polymorphisms in viral sequences, a factor that excludes laboratory contamination by a single positive control sequence.24
Implications of available early circulation evidence
Some studies that retrospectively screened respiratory samples collected from patients with influenza-like and respiratory symptoms, including those collected within the framework of influenza surveillance, found no evidence for SARS-CoV-2 (table 2),37–41 including during times when community SARS-CoV-2 transmission in respective populations was already occurring.20 Nonetheless, the combination of all knowledge acquired so far from retrospective studies strongly suggests that SARS-CoV-2 was circulating outside of China considerably earlier than the currently postulated time frame of late December 2019/early January 2020, at least in some parts of the world.
Good epidemiological insight comes from sewage water testing. For the virus to be detectable in wastewater, a somewhat sustained virus circulation must be happening, as demonstrated in a recent publication performing environmental surveillance in Lombardy.42 Fongaro et al showed that SARS-CoV-2 load remained temporally constant in the wastewater of Santa Catarina, Brazil, but increased in March 2020 in concomitance with the surge in COVID-19 cases.11 Results from La Rosa et al, who detected the virus in the wastewater of Milan and Turin in December 2019,21 are supported by the detection of SARS-CoV-2 in clinical samples since September 201924 and by the subsequent steady increase in COVID-19 cases observed at the beginning of 2020,20 before the exponential growth in cases became evident. However, it remains puzzling why the virus was detected in samples collected within the framework of measles surveillance and not in those available to the influenza surveillance network. This incongruence could be explained by the fact that skin manifestations draw more attention than respiratory signs and are almost always reported, making measles surveillance more sensitive and comprehensive, while also monitoring a lower and more manageable number of cases. In fact, mild COVID-19 cases may have been masked by the ongoing influenza season43 and the high number of respiratory infections common to this time of the year.
Although the notion of a somewhat sustained but unnoticed viral circulation may be difficult to accept after so many deaths and so many months of pandemic viral circulation, this possibility is ‘not astonishing’, as recently noted by Petti.9 Certainly, the virus had been circulating for some time before such dramatic excess mortality could have become noticeable and a low number of COVID-19-related deaths could have gone unnoticed among the high number of deaths associated with pneumonia of unknown origin that occur every year. Although currently difficult to establish without extensive sequencing of early strains, a virus with reduced transmissibility and/or virulence could explain a slow, undetected diffusion during the early months of its spread, causing only sporadic cases and/or limited outbreaks.44 The emergence of more infectious strains occurred several times during this pandemic,45 and differences in replication dynamics have been demonstrated in vitro for some early strains.46 However, preliminary sequencing results suggest that the strain circulating in pre-pandemic Europe in late 2019 may have been already capable of efficient human-to-human transmission24 and it has been recently hypothesised that SARS-CoV-2 strains may have acquired adaptive mutations in Europe, while spreading in parallel with Asian strains.47 Scattered circulation combined with the lack of awareness may have contributed to a slow and undetected early spread.44 48 49 These are aspects that future studies should investigate by retrospectively identifying, sequencing and studying in vitro early circulating strains.
To determine whether SARS-CoV-2 was already spreading outside of China in 2019, it is crucial to broaden our efforts and consider a wider geographical area and a larger timespan when investigating viral emergence. In particular, future research should focus on samples where a higher viral load can be expected and thus that are more suitable for virus detection and sequencing, that is, on severe cases of patients with pneumonia admitted to intensive care units.
Scepticism and reluctance to consider the early origin hypothesis
An analysis of the information transmission chain regarding the hypothesis of early circulation outside China reveals not only a dismissive attitude towards this hypothesis,50 but also that this hypothesis is linked to other hypotheses regarding SARS-CoV-2. In other words, accepting the idea that SARS-CoV-2 or its progenitor(s) might have circulated in many regions of the world for months before it was discovered in Wuhan challenges several widely accepted assumptions about this virus. However, it is important to note that the early origin of SARS-CoV-2 has no bearing on the debate about the laboratory leak versus natural origin, and it does not exclude the possibility of its origin in Hubei or somewhere else in China. The acceptance of the early origin is inconvenient in that we may no longer be able to use the circumstantial evidence of time and location of the first detection of SARS-CoV-2 as our final answer.
Despite the increasing documentation available in support of its early circulation, current scientific literature discussing the origin of SARS-CoV-2 is almost exclusively focused on the November/December 2019 hypothesis, completely ignoring this growing body of contradictory evidence. In fact, the possibility of early circulation is only seldom mentioned or discussed in such papers. Furthermore, as this alternative hypothesis clearly contradicts the timeline that is today held as the most likely, when these studies are cited, it is done dismissively, minimising the results obtained by numerous independent research groups. This attitude, pervasive among high-ranking journals, clearly demonstrates scepticism and has the consequence of avoiding a more critical interpretation of scientific data and of discouraging a constructive scientific debate that should consider all available facts when advancing a hypothesis and re-evaluate assumptions in light of new evidence. Additionally, this bias often results in rejection of manuscripts in support of an early SARS-CoV-2 circulation, reinforcing the ‘echo chamber’ effect. Science is a quest for ultimate truth, which shall not be discouraged by such mindset.
Research into the origins of SARS-CoV-2 is a challenging and fraught undertaking and there is still much that needs to be elucidated. Each study providing evidence for early circulation of SARS-CoV-2 might look inconclusive, but combining all data together reveals an emerging pattern. As all it would take to establish early circulation is a single confirmed positive case, attributing all results to false positives quickly becomes probabilistically untenable. Nonetheless, given the critical implications of these findings, it is important to obtain confirmatory proof for an early viral spread or lack thereof by independent investigations performed in WHO-accredited laboratories, as already occurred in one instance.23 Rescreening with more standardised and sensitive methods will provide confirmation for studies that found evidence for an early circulation and for those that did not. This aspect is crucial especially in those geographical areas affected during the first wave and where the lack of early viral detection is in obvious conflict with overwhelming epidemiological evidence. Additional approaches—such as metagenomic sequencing—could also be used to obtain more sequence information, which is essential for dating the beginning of viral spread more accurately. Finally, a more systematic approach to retrospectively test for anti-SARS-CoV-2 antibodies in numerous serum samples collected from a broad area could allow the identification of seroprevalence peaks that would help filter out potential background noise caused by cross-reactivity. International public health authorities should ideally coordinate such studies.
SARS-CoV-2 has cost the world the lives of millions of people, and the next (inevitable) pandemic might have more devastating outcomes. To be better prepared in the future and timely identify the emergence of novel pathogens, it is crucial to fully understand this pandemic and learn from our successes and our failures. Despite the technical limitations of available early origin studies, even a remote possibility that positive tests indicate an early SARS-CoV-2 circulation should be considered sufficient to warrant the scaling up of research to more samples from more regions and through a wider timespan. Time is running out: valuable samples that may contain the key to the understanding of SARS-CoV-2 origin might already have been destroyed as their regulatory storage time requirements lapse. Many more will meet the same fate in the coming months and years. What is there to lose in accepting this hypothesis as tenable and exploring it urgently before the chances of finding the answers to explain how this pandemic emerged are gone forever?
…Let us not waste our time in idle discourse! Let us do something, while we have the chance…at this place, at this moment of time, all mankind is us, whether we like it or not. Let us make the most of it before it is too late! –Waiting for Godot, Samuel Beckett.
Data availability statement
There are no data in this work.
Patient consent for publication
This study does not involve human participants.
MC and SB are joint first authors.
Handling editor Seye Abimbola
MC and SB contributed equally.
Contributors MC, SB and AA conceived the initial idea for the manuscript, and MC wrote the first draft. All authors provided critical feedback and approved the final submission.
Funding This study was funded by Romeo and Enrica Invernizzi Pediatric Research Center, Università degli Studi di Milano, Milan, Italy.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.