Professor Giorgio Gallinella, U.O. microbiology virologist, S. Orsola Malpighi hospital. Associated Professor of Microbiology at the university of Bologna
Virus SARS- CoV-2 (Severe acute respiratory syndrome coronavirus 2) is a virus belonging to the Coronaviridae, a vast and diversified family of RNA viruses which is capable of infecting many vertebrate hosts, including humans. Long-known viruses, which are able to infect man by causing slight respiratory infections (e.g., OC43, 229E, …), are included in the family. Moreover, in 2002-2003, the emergency and the spread of a severe infection caused by a new coronavirus, transmitted from animal host to man, named SARS-CoV (Severe Acute Respiratory Syndrome Coronavirus) was already observed. MERS-CoV (Middle East Respiratory Syndrome Coronavirus) is another coronavirus capable of causing severe infections, it was identified in 2012 and it can be sporadically transmitted by animals (camelids). In both cases, these viruses did not effectively adapt to humans and, partly thanks to preventive measures, the infection was successfully eradicated (SARS-CoV) or kept under control in its sporadic cases (MERS-CoV).
Sars-CoV-2 emerged at the end of 2019 as a new virus adaptable to human hosts and since then it became responsible for the current pandemic, which from a clinical standpoint manifests itself as a pathology named COVID-19 (coronavirus disease 2019). The efforts of the scientific community aim to better understand the virus, its spread, its ability to cause disease and to identify strategies for the prevention and treatment of the infection.
Emergency and spread
Severe cases of respiratory tract infections without a well-defined diagnosis have been observed in the city of Wuhan, China, since early December 2019. Those cases were notified to the WHO later that month and molecular investigation techniques allowed identification as the agent responsible for a new Coronavirus at the beginning of January. The first sequence of the virus genome was published on January 10 2020, therefore making the preparation of molecular diagnostic tests possible for a direct, large scale diagnosis. On one hand, the containment measures adopted in China allowed to reduce the spread of the virus in Chinese territory, but nonetheless they could not prevent its diffusion outside of Chinese borders, which had already happened before the virus was identified. Since February 2020, the virus has been progressively spotted in various countries in every continent, with a fully progressing spread on a global scale. Genetic evidence suggests that the virus originated in an animal reservoir and that it managed to transmit to humans as well. However, it is not possible to reconstruct the primary vent of transmission with certainty, not through the date, nor through place or modality.
Molecular traits and evolution
Knowledge on genetics and evolution of the virus is rapidly accumulating thanks to newly available molecular investigation techniques. Particularly thanks to the ability to quickly and accurately determine the viral genome sequence and thanks to the public availability of the data and bioinformatics techniques, which allow to operate genetic data comparisons on a large scale. In the first place, these studies make it possible to analyse the relationship of SARS-CoV-2 to other well-known viruses belonging to the same family and, therefore, to reconstruct the global spread and evolution of the virus, including the identification of the mutations which can make the virus more and more adapted to the human host.
Picture 1. Graphic representation of SARS-CoV-2. It indicates the RNA and N-Nucleoprotein complex, the M-Membrane protein, the E-Membrane (minor protein, pentamer) and S-Membrane (majorprotein, “spike”, trimer) proteins. From Expasy, Viral Zone (https://viralzone.expasy.org/). The virus particle (virion) consists of an RNA genome (typical for many viruses), complexed to viral proteins in order to form the nucleocapsid, which is enclosed by a membrane derived from infected cells, but modified by the insertion of virus-specific proteins. SARS-CoV-2 genome is about 30000 bases long- a common length for this family of viruses and among the largest compared to RNA-viruses. The genome is able to code for six main proteins: a protein is an enzyme, RNA-replicase, which does not belong to the virion but promotes the replication of the viral genome in the infected cell; the others are proteins, which contribute to the formation of the complete and infectious viral particle: S-protein (spike), M-protein (membrane), N-protein (nucleocapsid). Other minor proteins, which could play an important role in the virus ability to replicate and cause pathology, are also coded.
Picture 2. Graphic representation of SARS-CoV-2, mRNA and its proteins. Pp1a and pp1ab are precursor proteins, which eventually are broken up into smaller proteins, each serving a different function in the virus replication and interaction with the host cell. Especially RdRp, the enzyme which allows the replication of the viral genome. The other proteins are already produced in their final form and, in addition to major ones, there are also some accessory proteins (3a, 6, 7a/b, 8, 9b, 14), which are also important in the pathogenesis of the infection.
From Expasy, Viral Zone(https://viralzone.expasy.org/)
Special attention has been paid to S protein, which is found in the virus envelope. It gives the virus its typical morphology, which gave the name to the entire family and that is responsible for the interaction with the cell receptors found on human cells. This constitutes the initial step in the virus replicative cycle. Variations in the constitutive amino acids and in the three-dimensional protein structure can cause significant variations in the virus transmissibility, both regarding the original adaptation to animal hosts, and the human to human transmission.
SARS-CoV-2 is classified in the Betacoronavirus subfamily, together with other well-known viruses, such as OC43 and HKU1 respiratory viruses. Compared to other emerging viruses transmitted by animals, there is a 50% genetic similarity with MERS virus and 79% with SARS-CoV virus. SARS-CoV (responsible for SARS in 2002) and SARS-CoV-2 (responsible for COVID-19) belong to two similar, yet different lines of viruses, alongside with many other viruses which were identified in animal hosts, mainly bats (SARSr-CoV, SARS-related coronavirus). Although similar and presumably emerged from a wide animal reservoir, their genetic identity is sharply distinguished from other well-known viruses and, after their adaptation to humans, it originated two independent and separate evolutionary lines.
The most similar viruses to SARS-CoV-2 and to the previous SARS-CoV are isolated from bats, as revealed by investigations carried out in the same Chinese regions which saw the pandemic outbreak. These animal viruses demonstrate a >90% genetic identity and, in some cases, the level reaches up to >97% regarding SARS-Co-V-2. Yet, such genetic distances can equate to decades of evolution, so the results suggest the existence of a viral reservoir in bats, rather than the identification of a direct ancestor of the currently circulating viruses. Other viruses similar to the human pathogen have been isolated from the pangolin, here too, by investigations in the Chinese zones of the outbreak. The viruses which have been isolated from the pangolin tend to cluster as a genetic identity and they distinguish themselves from the human virus, as against those isolated from bats. Interestingly, there are converging mutations on the S-protein, which are involved in the identification of the cell receptor and that are very similar to the pangolin virus and to SARS-CoV-2. But this does not imply that human viruses stem from those of the pangolin. This animal is probably nothing but another coronavirus colonised host, possibly of exogenic origins. In conclusion, it is reasonable to attribute the coronavirus outflow to a bat reservoir, whereas investigations must move further in order to establish whether an intermediate host between the initial reservoir and humans was necessary and, if so, which one. At the moment, available genetic data do not allow to make verifiable assumptions on this matter.
The sequence molecular data from several isolated SARS-CoV-2 cases are quickly piling up, alongside to the spread of the virus and its identification in laboratories all over the world. The majority of sequences become public domain and are available on online databases, which are a source of phylogenetic and evolutionary analysis data. Indeed, such computer tools make it possible to compare the obtained sequences by identifying the intervening mutations and regrouping the different sequences according to their similarity, therefore proposing a hypothesis on the geographic and temporal spread of the virus and its evolution. Some mutations are found in just a few isolates, while others are more represented and can serve as a marker to identify any evolutionary lineage, the said clades. Some of these mutations also have an acknowledged functional role. For instance, a widespread mutation is found in the S protein (D614G mutation), which gives the virus maximum transmissibility and which has replaced the original variation thanks to the conferred selective advantage.
Picture 3. SARS-Co-2 Molecular epidemiology. The dots represent viral isolates, each from different continents according a colour code. The distance from the centre represents the genome mutation build-up, the genomes are grouped by similarity in different radial zones. From Nextstrain-GISAID (https://nextstrain.org/ncov/).
Picutre 4. Genetic diversity distribution among several isolates relative to SARS-CoV-2 genome (differences in amino acids). The higher the bar is, the greater diversity, which is concentrated in few locations. Of particular relevance is the variability traced in the portion which codes for S protein.