Sites of variation in SARS-CoV-2 spike protein. Amino acids in bright red have variations in many individuals, pink amino acids vary in fewer individuals, and white amino acids show very few variants.Download high quality TIFF image
Viruses, in their own mindless way, are masters of evolution. Two aspects of viral biology make them particularly successful. First, huge populations of viruses are generated as they infect cells and replicate. For example, during peak infection by SARS-CoV-2, there may be 1-100 billion viruses in an infected individual. Second, their molecular machinery for replication is often sloppy, introducing occasional errors in progeny. This is the perfect combination for rapid evolution. During an infection, many variants of the virus may be produced in these populations. Most sequence variations will damage the virus or will be neutral with little change for better or worse, but occasional variants will enhance some aspect of the viral life cycle. These rare advantageous variants have emerged multiple times in SARS-CoV-2, and have caused new waves of infection in the ongoing COVID-19 pandemic.
Scientists around the world have studied the evolution of SARS-CoV-2 to understand its capabilities and help plan for the future. The illustration shown here maps the major sites of variation on the spike protein
, based on over 3 million samples that have been sequenced and deposited in the GISAID
database. The structure is based on PDB ID 7kj2
, but coordinates were taken from SWISSMODEL
since the original PDB entry does not have atomic coordinates for several flexible loops. Also, the glycosylation is not shown in this illustration, to make the protein variation easier to see, so you have to imagine the protein covered with multiple carbohydrate chains.
As you can see, the sites of variation are scattered throughout the three-dimensional structure. Scientists are still sorting out the functions of each of these changes, but a few of the most common sites of variation are becoming clear. The most common mutation (at least currently) is at position 614. It is thought to control the stability of the upper portion of the spike, as described below. Another common mutation, 681, is found in a flexible loop that is clipped by the cellular protease furin, breaking the chain into two pieces. The upper part (S1) recognizes the host cell and the bottom portion (S2) directs fusion and entry into the cell. Researchers have found that this cleavage makes the virus more infectious with respiratory tract cells.
Important variants of SARS-CoV-2 spike with mutations in red and deletions in magenta. The active spike is cleaved into two functional pieces, S1 and S2, shown in turquoise and blue. S1 is composed of several functional domains: the N-terminal domain (NTD), the receptor-binding domain (RBD), and two C-terminal domains (CTD).Download high quality TIFF image
During the COVID-19 pandemic, SARS-CoV-2 has spread across the world, and variants have emerged by chance in different countries and rapidly spread from there. Structures of recent variants are shown here (PDB ID 7lwv
). They all have multiple changes, including sites where an amino acid has mutated (shown in red) and sites where amino acids have been deleted from the chain (shown in magenta). All include the two common changes mentioned above, along with other changes scattered across the entire structure. These may benefit the virus in many ways: mutations in the receptor-binding domain and C-terminal domains can improve recognition and attachment to cells, changes in the N-terminal domain can help evade the immune system, and mutations in the S2 region can enhance the process of fusion and entry into cells.