Full text
46,454 characters
· extracted from
preprint-html
· click to expand
Diverse tendencies in codon usage evolution of SARS-CoV-2 genes | bioRxiv /* */ /* */ <!-- <!-- /*! * yepnope1.5.4 * (c) WTFPL, GPLv2 */ (function(a,b,c){function d(a){return"[object Function]"==o.call(a)}function e(a){return"string"==typeof a}function f(){}function g(a){return!a||"loaded"==a||"complete"==a||"uninitialized"==a}function h(){var a=p.shift();q=1,a?a.t?m(function(){("c"==a.t?B.injectCss:B.injectJs)(a.s,0,a.a,a.x,a.e,1)},0):(a(),h()):q=0}function i(a,c,d,e,f,i,j){function k(b){if(!o&&g(l.readyState)&&(u.r=o=1,!q&&h(),l.onload=l.onreadystatechange=null,b)){"img"!=a&&m(function(){t.removeChild(l)},50);for(var d in y[c])y[c].hasOwnProperty(d)&&y[c][d].onload()}}var j=j||B.errorTimeout,l=b.createElement(a),o=0,r=0,u={t:d,s:c,e:f,a:i,x:j};1===y[c]&&(r=1,y[c]=[]),"object"==a?l.data=c:(l.src=c,l.type=a),l.width=l.height="0",l.onerror=l.onload=l.onreadystatechange=function(){k.call(this,r)},p.splice(e,0,u),"img"!=a&&(r||2===y[c]?(t.insertBefore(l,s?null:n),m(k,j)):y[c].push(l))}function j(a,b,c,d,f){return q=0,b=b||"j",e(a)?i("c"==b?v:u,a,b,this.i++,c,d,f):(p.splice(this.i++,0,a),1==p.length&&h()),this}function k(){var a=B;return a.loader={load:j,i:0},a}var l=b.documentElement,m=a.setTimeout,n=b.getElementsByTagName("script")[0],o={}.toString,p=[],q=0,r="MozAppearance"in l.style,s=r&&!!b.createRange().compareNode,t=s?l:n.parentNode,l=a.opera&&"[object Opera]"==o.call(a.opera),l=!!b.attachEvent&&!l,u=r?"object":l?"script":"img",v=l?"script":u,w=Array.isArray||function(a){return"[object Array]"==o.call(a)},x=[],y={},z={timeout:function(a,b){return b.length&&(a.timeout=b[0]),a}},A,B;B=function(a){function b(a){var a=a.split("!"),b=x.length,c=a.pop(),d=a.length,c={url:c,origUrl:c,prefixes:a},e,f,g;for(f=0;f<d;f++)g=a[f].split("="),(e=z[g.shift()])&&(c=e(c,g));for(f=0;f<b;f++)c=x[f](c);return c}function g(a,e,f,g,h){var i=b(a),j=i.autoCallback;i.url.split(".").pop().split("?").shift(),i.bypass||(e&&(e=d(e)?e:e[a]||e[g]||e[a.split("/").pop().split("?")[0]]),i.instead?i.instead(a,e,f,g,h):(y[i.url]?i.noexec=!0:y[i.url]=1,f.load(i.url,i.forceCSS||!i.forceJS&&"css"==i.url.split(".").pop().split("?").shift()?"c":c,i.noexec,i.attrs,i.timeout),(d(e)||d(j))&&f.load(function(){k(),e&&e(i.origUrl,h,g),j&&j(i.origUrl,h,g),y[i.url]=2})))}function h(a,b){function c(a,c){if(a){if(e(a))c||(j=function(){var a=[].slice.call(arguments);k.apply(this,a),l()}),g(a,j,b,0,h);else if(Object(a)===a)for(n in m=function(){var b=0,c;for(c in a)a.hasOwnProperty(c)&&b++;return b}(),a)a.hasOwnProperty(n)&&(!c&&!--m&&(d(j)?j=function(){var a=[].slice.call(arguments);k.apply(this,a),l()}:j[n]=function(a){return function(){var b=[].slice.call(arguments);a&&a.apply(this,b),l()}}(k[n])),g(a[n],j,b,n,h))}else!c&&l()}var h=!!a.test,i=a.load||a.both,j=a.callback||f,k=j,l=a.complete||f,m,n;c(h?a.yep:a.nope,!!i),i&&c(i)}var i,j,l=this.yepnope.loader;if(e(a))g(a,0,l,0);else if(w(a))for(i=0;i (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0];var j=d.createElement(s);var dl=l!='dataLayer'?'&l='+l:'';j.src='//www.googletagmanager.com/gtm.js?id='+i+dl;j.type='text/javascript';j.async=true;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-M677548'); Skip to main content Home About Submit ALERTS / RSS Search for this keyword Advanced Search New Results Diverse tendencies in codon usage evolution of SARS-CoV-2 genes View ORCID Profile Paweł Błażej , View ORCID Profile Dorota Mackiewicz , View ORCID Profile Paweł Mackiewicz doi: https://doi.org/10.1101/2025.05.30.657095 Paweł Błażej 1 Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław , ul. Joliot-Curie 14a, 50-383 Wrocław, Poland Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Paweł Błażej For correspondence: pawel.blazej{at}uwr.edu.pl Dorota Mackiewicz 1 Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław , ul. Joliot-Curie 14a, 50-383 Wrocław, Poland Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Dorota Mackiewicz Paweł Mackiewicz 1 Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław , ul. Joliot-Curie 14a, 50-383 Wrocław, Poland Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Paweł Mackiewicz Abstract Full Text Info/History Metrics Preview PDF Abstract The dynamic evolution of SARS-CoV-2 virus since the COVID-19 outbreak in late 2019 has raised questions about potential evolutionary trends in protein-coding sequences and their adaptation to the human host. To address this, we compiled a dataset of 94,571 complete genomes with known collection dates, spanning from January 2020 to October 2024. Using a novel representation of codon usage, we recoded SARS-CoV-2 protein-coding sequences to strings of labels reflecting the human synonymous codon usage. Our analysis reveals different evolutionary pathways in the codon usage between structural, non-structural and accessory protein-coding sequences from the coronavirus. The genes coding for structural proteins tend to exhibit a less optimal adaptation to the human codon usage, whereas open reading frames ORF1a and ORF1ab encoding non-structural proteins show an opposite trend. The sequences for the accessory proteins demonstrated a variable tendency to change the codon preferences. The evolution of the more optimal codon usage in ORF1a and ORF1ab sequences can be associated with a higher speed and efficiency of translation of the coded polyproteins. Following their cleavage, the products play important roles in viral replication and transcription. Thus, the adaptation of their codons can increase the virus’ proliferation. In contrast, alterations in codon usage within structural protein-coding sequences may be associated with changes in their less accurate translation and folding during the synthesis, which can provide an advantage in evading the host immune response. The results show that codon usage adaptations to the human host differ based on the gene type and function, reflecting a balance between their conflicting evolutionary pressures. Our findings on variations in codon usage among coronavirus genes provide valuable insights that can aid in developing new strategies for the optimization of codons in vaccine mRNA and DNA for emerging strains. Introduction Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the etiological agent responsible for the COVID-19 pandemic, emerged in Wuhan, China, in late 2019 ( 1 ). SARS-CoV-2 is classified as a betacoronavirus and shares a phylogenetic relationship with SARS-CoV and MERS-CoV but demonstrates distinct epidemiological and clinical characteristics ( 2 ). The SARS-CoV-2 genome consists of a positive-sense, single-stranded RNA molecule of approximately 30 kb ( 3 ). The typical genome organization includes 14 open reading frames (ORFs), which encode 31 proteins. Among them, there are two large polyproteins (encoded by ORF1a and ORF1ab ), four structural proteins and several accessory proteins ( 4 , 5 ). The polyproteins are translated directly from the viral genomic RNA and subsequently cleaved into 16 nonstructural proteins, which perform essential functions in viral replication, transcription, RNA modification, protein processing as well as suppressing host gene expression and immune response. To the structural proteins belong: surface glycoprotein known as Spike protein (S), envelope glycoprotein (E), membrane glycoprotein (M) and nucleocapsid glycoprotein (N) ( 6 ). The S protein is the key to host cell entry. It binds to the ACE2 receptor on the cell surface, triggering fusion of the viral and cellular membranes, allowing the virus to enter the cell. The E protein is a small protein involved in viral assembly, budding and release. It also acts as an ion channel and contributes to the pathogenicity of the virus, whereas the M protein is the most abundant structural protein. It helps maintain the virion’s shape and stability, interacts with other structural proteins and plays a crucial role in viral assembly. The N protein binds to the viral RNA genome, forming the ribonucleoprotein complex. It protects the viral RNA and is involved in viral replication and transcription. Products of other ORFs play important roles mainly in modulating host immune responses and facilitating viral replication. After binding the Spike protein to the host cell receptor ACE2, the S protein undergoes conformational changes, allowing the viral envelope to fuse with the host cell membrane, releasing the viral genome into the cytoplasm ( 7 ). Since the viral RNA acts as mRNA, it is translated by host ribosomes into the two large polyproteins ( 8 , 9 ). After their cleavage by viral proteases, they form a replication-transcription complex, which synthesizes new viral genomes through RNA-dependent RNA polymerase. Other proteins, structural and accessory ones, are produced by discontinuous transcription of subgenomic mRNAs and subsequent translation. The newly synthesized genomic RNA and N proteins interact with E and M proteins in the ER-Golgi intermediate compartment to form new virions, which are released from infected cells via exocytosis. The viral strategy involves efficient replication and the production of vast array of proteins, which are then assembled into numerous virions. Thus, we can expect that the evolution of mechanisms accelerating these processes should be favoured. One of them can be changes in the synonymous codon usage, which can influence the speed and efficiency of translation ( 10 – 15 ). Many studies indicate that highly ex-pressed genes prefer codons, which are recognized by more numerous tRNA molecules. Since the SARS-CoV-2 utilizes the human host machinery, we can expect a relationship between the codon usage in the viral genes and the codon bias typical of Homo sapiens genes. The codon usage can also impact the speed and accuracy of translation elongation, causing changes in co-translational protein folding ( 15 – 21 ). It can also be important for modifying viral proteins that interact directly with the host’s immune system. Certain analyses showed a general adaptation of viral to human codon usage ( 22 , 23 ) or specifically to the upper respiratory tract and alveoli ( 24 ). However, other studies reported that strains with new mutations ( 25 ), all genes considered to-gether ( 26 , 27 ) or only genes for N and S proteins show a lower adaptation to the codon usage in the human host ( 28 ). Some authors noticed that the codon adaptation index (CAI) values calculated for the concatenated coding sequences of the coronavirus sequences have decreased over time with occasional fluctuations ( 29 ). However, the period considered in the study was rather short. The inconsistency in results reported by different authors may stem from the use of diverse methodologies and codon parameters, as well as the analysis of varied datasets. Therefore, we developed a new measure that recoded the codon usage in terms of the human host and studied a larger number of genomes collected from a wider evolution time of the coronavirus. Thanks to that, we could track temporal changes in protein-coding sequences over a longer time. Material and methods A. Dataset We analyzed 94571 SARS-CoV-2 genomes downloaded from NCBI Virus database. We selected only complete genomes, isolated from human as well as possessing correctly annotated collection date and country. We studied protein-coding sequences from these genomes that were identified by VADR v1.4.2 - Viral Annotation DefineR ( 30 ) software. The reference model was vadr-models-sarscov2-1.3-2, which is based on NC_045512.2 RefSeq sequence obtained from isolate Wuhan-Hu-1 ( 31 ). The downloaded set of genomes was a collection of genomes from 94 countries around the world. However, the proportion of genomes sequenced in the USA and the United Kingdom constituted over 88% of the dataset, i.e. 84324 genomes, hereby to avoid unbalanced sets, we decided to consider data only from these two countries. The collection dates for this filtered dataset spanned from January 2020 to October 2024. B. Codon usage recoding We developed a novel measure termed codon block recoding (CBR), which maps the 64 codons to a set of ordered categorical labels reflecting the synonymous codon usage patterns. Specifically, CBR assigns each codon a label from 0 to 5, corresponding to its relative usage among synonymous codons encoding the same amino acid in protein-coding sequences of an organism. The most frequent codon in a synonymous block has assigned the label 0, whereas the least frequent codon has ascribed the label n − 1, where n is the number of codons in the block. Methionine and tryptophan, being represented by single codons, have the label 0. This mapping can be applied to any variant of genetic code and used for any genome. The labels {0, 1, 2, 3, 4, 5} are ordered categorical variables, which can be useful in qualitative studies. They are also easy to interpret and can be used, for example, to analyze the codons in groups characterized by distinct, high or low, relative usage. In this study, we applied human synonymous codon usage (HCU) to obtain human codon block recoding (HCBR), which is presented in Fig. 1 . The table includes the canonical codon blocks encoding 20 amino acids and stop codons (*) with assigned relative frequencies of synonymous codons in human genes and assigned respective labels. The codon usage values were obtained from Codon Usage Database via python_codon_tables package ( https://github.com/Edinburgh-Genome-Foundry/codonusage-tables/tree/master/python_codon_tables ). The HCBR, or briefly the codon labels, were used to recode codons in individual coronavirus protein-coding sequences. Download figure Open in new tab Fig. 1. The human synonymous codon usage (HCU) table with assigned human codon block recoding (HCBR) labels. Cod - codon; AA - encoded amino acid; HCU - human synonymous codon usage; LA - assigned label corresponding to the usage Results C. Codon usage in a single sequence The codon block recoding introduces a new representation of protein-coding sequences in terms of synonymous codon usage. Fig. 2 presents, as an example, the surface glycoprotein-coding sequence from the SARS-CoV-2 genome of isolate Wuhan-Hu-1, which was recoded according to HCBR. The individual codons have assigned labels corresponding the rank of synonymous codons according to their relative usage in the human protein-coding genes. Each dot represents a specific label of codon (y-axis) occupying its position in the sequence (x-axis). The codons labeled as 0, 1, 2 and 3 are predominant in the whole set. Some codons labeled as 4 and 5 are grouped in clusters in the sequences. Download figure Open in new tab Fig. 2. The surface glycoprotein-coding sequence from SARS-CoV-2 with codons recoded by its HCBR label. The usage of recoded codons in the surface glycoproteincoding sequence from the coronavirus differs from that in all human protein-coding genes ( Fig. 3 ). The synonymous codons assigned to group 0, i.e. characterized by the highest relative usage in the human genes, are two times less frequent in the coronavirus sequence than in the genes, whereas those labeled 1 are more numerous. A slightly higher frequency is also for the less optimal codons with label 2, 3 and 4 in the surface protein gene than in the human genes. The frequency of codons 5 is comparable. The same tendencies for the first two codon types, we observed for other coronavirus protein-coding sequence with the exception to the envelope protein-coding sequence, in which the codons labeled 1 were also less frequent than in the human genes. Download figure Open in new tab Fig. 3. The distribution of codons labeled according to HCU for the surface glycoprotein-coding sequence from SARS-CoV-2 and all human genes. Since we are interested in the codons commonly used in human protein-coding sequences, we calculated a combined frequency of 0 and 1 labeled codons, which measures the abundance of codons preferred in the human genes. The frequency of such codons in the example sequence is equal to 0.671, whereas in the human protein-coding sequences 0.763. Thus, the usage of these codons in the surface glycoprotein-coding sequence is lower than expected in human genes. Similarly, all other coronavirus gene sequences exhibit fewer of these codons than would be expected, especially the gene for envelope protein and ORF10 ( Table 1 ). View this table: View inline View popup Download powerpoint Table 1. The fraction of codons labeled 0 and 1 (0 + 1) calculated for SARS-CoV-2 protein-coding sequences from NC_045512.2 reference genome of isolate Wuhan-Hu-1. D. Changes of synonymous codon usage in time The human codon block recoding was applied to track changes in the synonymous codon usage of individual protein-coding sequences from SARS-CoV-2 with time. We investigated eleven functional open reading frames (ORFs) or genes annotated in SARS-CoV-2 genomes: ORF1ab, ORF1a , gene for envelope protein, gene for membrane glycoprotein, gene for nucleocapsid phospoprotein, gene for surface glycoprotein as well as several accessory open reading frames: ORF3a,ORF6,ORF7a,ORF8 and ORF10 . The studied sequences were grouped according to coded structural, non-structural and accessory proteins, given their distinct roles in the viral life cycle. Following recoding, we calculated the frequency of codons most frequently used in human genes, i.e. labeled as 0 or 1, for each gene sequence. The values were aggregated monthly using arithmetic mean calculated for the sequences. This allowed us to describe the central point of available variants of protein-coding sequences in a month and find possible evolutionary trends in their codon usage in time. Non-structural protein-coding sequences Fig. 4 depicts changes in 0 + 1 codon frequencies over time for each non-structural protein-coding sequence, i.e. ORF1a and ORF1ab using the arithmetic mean calculated monthly. Since these two open reading frames overlap, on a long section, the changes in the frequency are very similar. The codons labeled as 0 + 1 were initially relatively rare from January 2020 to December 2021. From that to April 2022, we can observe a substantial shift towards higher frequencies of synonymous codons that are also the most commonly used in human genes. Next, their frequency remained at a similar relatively high level with a small decrease at the end of 2023 to the beginning of 2024. From that, the frequency gradually increased again. The global tendency, in the long run, indicates that these sequences optimized their synonymous codon usage to that in human protein genes. Download figure Open in new tab Fig. 4. The changes in the frequency of codons labeled as 0 and 1 in SARS-CoV-2 non-structural protein-coding sequences from January 2020 to October 2024. The dynamic of changes was described by the arithmetic mean calculated monthly. Structural protein-coding sequences In contrast to the sequences coding for non-structural proteins, those encoding structural ones showed generally a decrease in the codons frequently used in the human host for a long time ( Fig. 5 ). In all four sequences, the frequencies of 0 + 1 calculated at the end of 2024 are characterized by lower values in comparison those in January 2020. Thus, the structural protein-coding sequences tended to use less optimal codons in terms of HCU over time. Interestingly, the dynamic of changes differs between the sequences. In the case of the surface glycoproteincoding sequence, we observed a general trend to decrease 0 + 1 frequency with several local extrema: the local minimum in September 2021 and February 2023 as well as the local maximum in February 2022 and October 2023 ( Fig. 5 ). These fluctuations suggest that the decrease in the contribution of 0 + 1 codons was not linear but was disturbed by occasional variations. Download figure Open in new tab Fig. 5. The changes in the frequency of codons labeled as 0 and 1 in SARS-CoV-2 structural protein-coding sequences from January 2020 to October 2024. The dynamic of changes was described by the arithmetic mean calculated monthly. Three other structural protein-coding sequences demonstrated a characteristic decline from November 2021 up to: March 2022 (gene for envelope protein), February 2022 (gene for membrane glycoprotein) or August 2022 (gene for nucleocapsid phosphoprotein). All these curves have an inflection point at the border of years 2021 and 2022. The genes encoding the envelope protein and membrane glycoprotein exhibited a consistently high frequency of codons 0 and 1 up to November 2021. In contrast, the frequency of these codons in the nucleocapsid phosphoprotein gene fluctuated, peaking in October 2021. After the drastic decrease, the course of the curve for the codon frequency was stable through time but in the genes for membrane glycoprotein and nucleocapsid phosphoprotein it fluctuated reaching several local extrema. Accessory protein-coding sequences Other sequences coding for accessory proteins do not show consistent tendencies in the change of 0 + 1 codon frequencies ( Fig. 6 ). However, we can notice a drop in this measure for ORF6 sequence in the long term. Up to November 2021, the frequency was relatively high. Between November 2021 and April 2022 there was a drastic decrease with the minimum in March 2022 and a fast increase from April 2022 to September 2023. After a short stabilization at a relatively high level, the frequency diminished again from January 2023. Then, from June 2023 the frequency remained permanently low. ORF3a, ORF7a and ORF8 showed a rather high and constant 0 + 1 frequency in time with an episodic quick decline and rise in various periods: November 2021-April 2022 (with the minimum in February 2022), February 2023-January 2024 (with the minimum in July 2023), February 2021-February 2022 (with the minimum in October 2021), respectively. In contrast to that, the frequency of codons 0 and 1 in ORF10 , was rather low for the studied period but between May 2023 and May 2024, there was a sudden increase and decrease with the maximum in October 2023. Download figure Open in new tab Fig. 6. The changes in the frequency of codons labeled as 0 and 1 in SARS-CoV-2 non-structural protein-coding sequences from January 2020 to October 2024. The dynamic of changes was described by the arithmetic mean calculated monthly. Discussion The recoding of codons according to the synonymous codon usage in human turned out useful in detecting changes in time in the codon preferences of protein-coding sequences from SARS-CoV-2 genomes. We studied the combined frequency of the two most widely used synonymous codons in human protein genes. Various groups of coronavirus gene sequences demonstrated different tendencies. This frequency in ORF1 _ab and ORF_1a increased as time progressed. Although the fraction of these codons never exceeded the expected value calculated for human codon usage, the increase in this frequency suggests a clear tendency for optimization of the viral codon usage with respect to human codon preferences. These ORFs encode polyproteins, which are cleavaged into smaller non-structural proteins. The enhanced adaptation of these sequences is likely linked to the critical role of their products in viral replication and the expression of other genes at the early stage of the virus infection cycle. Since the coronavirus utilizes the host translational machinery including tRNAs, the usage of codons also preferred in human genes can facilitate and speed up protein biosynthesis. This is in line with the general view that the synonymous codon bias can affect the rate and efficiency of translation. ( 10 – 15 ). Consequently, the more efficient translation leading to a higher yield of polyproteins and their products can accelerate viral proliferation and transmission. Our findings indicate that the codon usage of key viral proteins adapts over time to align with the host’s codon bias. In agreement with that, the general study of the codon usage in the coding sequences of 502 human-infecting viruses including SARS-CoV-2 found that the adaptation is visible in early viral proteins ( 24 ). However, the structural protein-coding sequences exhibited a decreasing tendency in the frequency of the synonymous codons preferred by the human host. This suggests that these sequences tend to use less optimal codons in terms of HCU, which corresponds to the results by ( 28 ). The usage of poorly optimized codons may cause slower and more inaccurate translation elongation, which influences also the folding of synthetized proteins ( 15 – 21 ). Consequently, the overall production of viral proteins can be reduced. Moreover, the deviations from optimal translation rates as well as undesirable interactions between codons and non-cognate tRNAs can increase the number of various misfolded protein variants ( 32 , 33 ). Since the structural proteins are exposed to the host immune system, their smaller number produced in non-optimal translation can be beneficial for the virus due to reduced recognition by the system. Moreover, the variable structure of epitopes can help to avoid the host response. Additionally, the rapid evolution of codon usage patterns over time can enhance the virus’s ability to adapt. In fact, we observed fluctuations in the usage of the optimal codons for genes encoding surface glycoprotein and nucleocapsid phosphoprotein eliciting a strong immune response ( 34 , 35 ), which can be associated with changing the structure of these proteins with time. The genes for accessory proteins demonstrated variable tendencies in the optimal codon usage. Due to their various function, it is difficult to find a general explanation for the changes in their codon frequencies. Nevertheless, the tendency to evolve and maintain the low frequency of the optimal codons observed in the structural and some accessory protein-coding sequences, can also associated with greater flexibility and adaptation to invade a broader range of hosts with different codon usage signatures ( 36 , 37 ). In fact, it was found that SARS-CoV-2 can infect a range of mammalian species ( 38 – 40 ). It was also proposed that viruses with codon usage too similar to that of the host can be harmful to its cells due to the depletion of tRNA pool ( 41 , 42 ). This can disrupt host translation and other cellular processes, potentially leading to over-exploitation of the host cell, which may ultimately limit the virus’s ability to multiply intensively. Eight out of 11 studied protein-coding sequences revealed a drastic change in codon usage in a short time at the turn of the years 2021 and 2022. These changes correspond very well to the shift from Delta to Omicron variant ( 43 – 45 ). The Omicron was first identified on November 24, 2021 in South Africa and on December 1, 2021 in the United States. By the week ending December 25, it had become very quickly the dominant variant. The emergence of Omicron likely resulted from its high mutation rate, ability to evade immunity as well as the global connectivity and transmission that facilitated its rapid spread. Using data filtered for USA and United Kingdom provided by ( 46 ), we also found that to the end of 2021 there was a significant increase in daily COVID-19 vaccine doses administered per million people, which culminated in December 2021. Vaccinations can also contribute to elevated selective pressure on the virus, favouring the emergence of mutations that enable it to evade immune responses. As a result, more viruses could replicate, thereby providing additional opportunities for mutations to occur ( 47 , 48 ). According to this dataset, daily new confirmed COVID-19 cases per million people started to grow from the middle of December 2021 and received the maximum in these countries in the middle of January 2022. The larger number of infected individuals provided also more opportunities for the virus to replicate, which in turn increases the probability of generating new mutations. It is not inconceivable that the changes in codon usage observed in our studies resulted from the accumulation of new mutations and were associated with the phenomena mentioned above. Our findings also suggest that changes in codon usage, in terms of optimality for the human host, vary depending on the type and role of the genes. This reveals a trade-off between their competing evolutionary strategies. Our results, highlighting variations in codon usage across different coronavirus protein-coding genes, provide valuable insights that can aid in designing effective attenuated vaccines for novel strains, thereby supporting efforts to combat the pandemic. This can be achieved by the influence of the codon (de)optimization in mRNA and DNA used in these vaccines. References 1. ↵ Na Zhu , Dingyu Zhang , Wenling Wang , Can Li , Bing Yang , Jianguo Song , Xing Zhao , Bin Huang , Wei Shi , Rou Lu , et al. A novel coronavirus from patients with pneumonia in china, 2019 . New England Journal of Medicine , 382 ( 8 ): 727 – 733 , 2020 . OpenUrl CrossRef PubMed 2. ↵ Roujian Lu , Peng Yang , Jin Cui , Qi Zhang , Li Fan , Yongtao Dai , Qiang Wu , Lin Li , Shuye Zhang , Lei Zhu , et al. Genomic characterisation and epidemiology of 2019 novel coro-navirus: implications for outbreak origin and receptor binding . The Lancet , 395 ( 10228 ): 565 – 574 , 2020 . OpenUrl CrossRef 3. ↵ Fan Wu , Su Zhao , Bing Yu , Yi-Ming Chen , Wei Wang , Zhi-Gang Song , Yuan Hu , Zhi-Wei Tao , Yu-Hua Huang , Wen-Hao Tan , et al. Genome composition and divergence of the novel coronavirus (2019-ncov) originating in china . bioRxiv , 2020 . 4. ↵ Natalia Redondo , Sara Zaldívar-López , Juan J. Garrido , and Maria Montoya . Sars-cov-2 accessory proteins in viral pathogenesis: Knowns and unknowns . Frontiers in Immunology , 12 , July 2021 . ISSN 1664-3224 . doi: 10.3389/fimmu.2021.708264 . OpenUrl CrossRef PubMed 5. ↵ Shubhangi Gupta , Deepanshu Gupta , and Sonika Bhatnagar . Analysis of sars-cov-2 genome evolutionary patterns . Microbiology Spectrum , 12 ( 2 ), February 2024 . ISSN 2165-0497 . doi: 10.1128/spectrum.02654-23 . OpenUrl CrossRef 6. ↵ Daniel Wrapp , Nianshuang Wang , Kizzmekia S Corbett , Eranga De Silva , G Christopher Ippolito , and Jason S McLellan . Cryo-em structure of the 2019-ncov spike in the prefusion conformation . Science , 367 ( 6483 ): 1260 – 1263 , 2020 . OpenUrl Abstract / FREE Full Text 7. ↵ Massimo Pizzato , Chiara Baraldi , Giulia Boscato Sopetto , Davide Finozzi , Carmelo Gentile , Michele Domenico Gentile , Roberta Marconi , Dalila Paladino , Alberto Raoss , Ilary Riedmiller , Hamza Ur Rehman , Annalisa Santini , Valerio Succetti , and Lorenzo Volpini . Sars-cov-2 and the host cell: A tale of interactions . Frontiers in Virology , 1 , January 2022 . ISSN 2673-818X . doi: 10.3389/fviro.2021.815388 . OpenUrl CrossRef 8. ↵ Maria Romano , Alessia Ruggiero , Flavia Squeglia , Giovanni Maga , and Rita Berisio . A structural view of sars-cov-2 rna replication machinery: Rna synthesis, proofreading and final capping . Cells , 9 ( 5 ): 1267 , May 2020 . ISSN 2073-4409 . doi: 10.3390/cells9051267 . OpenUrl CrossRef 9. ↵ Brandon Malone , Nadya Urakova , Eric J. Snijder , and Elizabeth A. Campbell . Structures and functions of coronavirus replication–transcription complexes and their relevance for sars-cov-2 drug design . Nature Reviews Molecular Cell Biology , 23 ( 1 ): 21 – 39 , November 2021 . ISSN 1471-0080 . doi: 10.1038/s41580-021-00432-z . OpenUrl CrossRef 10. ↵ T. Ikemura . Codon usage and trna content in unicellular and multicellular organisms . Mol Biol Evol , 2 ( 1 ): 13 – 34 , 1985 . ISSN 0737-4038 (Print) 0737-4038 (Linking). OpenUrl CrossRef PubMed Web of Science 11. Michael Bulmer . Coevolution of codon usage and transfer rna abundance . Nature , 325 ( 6106 ): 728 – 730 , February 1987 . ISSN 1476-4687 . doi: 10.1038/325728a0 . OpenUrl CrossRef PubMed Web of Science 12. S. Kanaya , Y. Yamada , Y. Kudo , and T. Ikemura . Studies of codon usage and trna genes of 18 unicellular organisms and quantification of bacillus subtilis trnas: gene expression level and species-specific diversity of codon usage based on multivariate analysis . Gene , 238 ( 1 ): 143 – 155 , 1999 . ISSN 0378-1119 . doi: 10.1016/S0378-1119(99)00225-5 . OpenUrl CrossRef PubMed Web of Science 13. S. Kanaya , Y. Yamada , M. Kinouchi , Y. Kudo , and T. Ikemura . Codon usage and trna genes in eukaryotes: Correlation of codon usage diversity with translation efficiency and with cg-dinucleotide usage as assessed by multivariate analysis . Journal of Molecular Evolution , 53 ( 4-5 ): 290 – 298 , 2001 . ISSN 0022-2844 . doi : DOI 10.1007/s002390010219 . OpenUrl CrossRef PubMed Web of Science 14. Idan Frumkin , Marc J. Lajoie , Christopher J. Gregg , Gil Hornung , George M. Church , and Yitzhak Pilpel . Codon usage of highly expressed genes affects proteome-wide translation efficiency . Proceedings of the National Academy of Sciences , 115 ( 21 ), May 2018 . ISSN 1091-6490 . doi: 10.1073/pnas.1719375115 . OpenUrl Abstract / FREE Full Text 15. ↵ Yi Liu , Qian Yang , and Fangzhou Zhao . Synonymous but not silent: The codon usage code for gene expression and protein folding . Annual Review of Biochemistry , 90 ( 1 ): 375 – 401 , June 2021 . ISSN 1545-4509 . doi: 10.1146/annurev-biochem-071320-112701 . OpenUrl CrossRef 16. Chien-Hung Yu , Yunkun Dang , Zhipeng Zhou , Cheng Wu , Fangzhou Zhao , Matthew S. Sachs , and Yi Liu . Codon usage influences the local rate of translation elongation to regulate co-translational protein folding . Molecular cell , 59 : 744 – 754 , September 2015 . ISSN 1097-4164 . doi: 10.1016/j.molcel.2015.07.018 . OpenUrl CrossRef PubMed 17. Mian Zhou , Tao Wang , Jingjing Fu , Guanghua Xiao , and Yi Liu . Nonoptimal codon usage influences protein structure in intrinsically disordered regions . Molecular Microbiology , 97 ( 5 ): 974 – 987 , June 2015 . ISSN 1365-2958 . doi: 10.1111/mmi.13079 . OpenUrl CrossRef PubMed 18. Yi Liu . A code within the genetic code: codon usage regulates co-translational protein folding . Cell communication and signaling : CCS , 18 : 145 , September 2020 . ISSN 1478-811X . doi: 10.1186/s12964-020-00642-6 . OpenUrl CrossRef PubMed 19. Anton A. Komar . A code within a code: How codons fine-tune protein folding in the cell . Biochemistry (Moscow) , 86 ( 8 ): 976 – 991 , August 2021 . ISSN 1608-3040 . doi: 10.1134/s0006297921080083 . OpenUrl CrossRef PubMed 20. McKenze J. Moss , Laura M. Chamness , and Patricia L. Clark . The effects of codon usage on protein structure and folding . Annual Review of Biophysics , 53 ( 1 ): 87 – 108 , July 2024 . ISSN 1936-1238 . doi: 10.1146/annurev-biophys-030722-020555 . OpenUrl CrossRef PubMed 21. ↵ Xinkai Wu , Mengze Xu , Jian-Rong Yang , and Jian Lu . Genome-wide impact of codon usage bias on translation optimization in drosophila melanogaster . Nature Communications , 15 ( 1 ), September 2024 . ISSN 2041-1723 . doi: 10.1038/s41467-024-52660-4 . OpenUrl CrossRef PubMed 22. ↵ Dimpal A. Nyayanit , Pragya D. Yadav , Rutuja Kharde , and Sarah Cherian . Natural selection plays an important role in shaping the codon usage of structural genes of the viruses belonging to the coronaviridae family . Viruses , 13 ( 1 ): 3 , December 2020 . ISSN 1999-4915 . doi: 10.3390/v13010003 . OpenUrl CrossRef PubMed 23. ↵ Daniele Ramazzotti , Fabrizio Angaroni , Davide Maspero , Mario Mauri , Deborah D’Aliberti, Diletta Fontana , Marco Antoniotti , Elena Maria Elli , Alex Graudenzi , and Rocco Piazza . Large-scale analysis of sars-cov-2 synonymous mutations reveals the adaptation to the human codon usage during the virus evolution . Virus Evolution , 8 ( 1 ), January 2022 . ISSN 2057-1577 . doi: 10.1093/ve/veac026 . OpenUrl CrossRef 24. ↵ Xavier Hernandez-Alias , Hannah Benisty , Martin H. Schaefer , and Luis Serrano . Translational adaptation of human viruses to the tissues they infect . Cell Reports , 34 ( 11 ): 108872 , March 2021 . ISSN 2211-1247 . doi: 10.1016/j.celrep.2021.108872 . OpenUrl CrossRef PubMed 25. ↵ Neetu Tyagi , Rahila Sardar , and Dinesh Gupta . Natural selection plays a significant role in governing the codon usage bias in the novel sars-cov-2 variants of concern (voc) . PeerJ , 10 : e13562 , June 2022 . ISSN 2167-8359 . doi: 10.7717/peerj.13562 . OpenUrl CrossRef PubMed 26. ↵ Ayan Roy , Fucheng Guo , Bhupender Singh , Shelly Gupta , Karan Paul , Xiaoyuan Chen , Neeta Raj Sharma , Nishika Jaishee , David M. Irwin , and Yongyi Shen . Base composition and host adaptation of the sars-cov-2: Insight from the codon usage perspective . Frontiers in Microbiology , 12 , April 2021 . ISSN 1664-302X . doi: 10.3389/fmicb.2021.548275 . OpenUrl CrossRef PubMed 27. ↵ Patrick Eldin , Alexandre David , Christophe Hirtz , Jean-Luc Battini , and Laurence Briant . Sars-cov-2 displays a suboptimal codon usage bias for efficient translation in human cells diverted by hijacking the trna epitranscriptome . International Journal of Molecular Sciences , 25 ( 21 ): 11614 , October 2024 . ISSN 1422-0067 . doi: 10.3390/ijms252111614 . OpenUrl CrossRef PubMed 28. ↵ Elisa Posani , Maddalena Dilucca , Sergio Forcelloni , Athanasia Pavlopoulou , Alexandros G. Georgakilas , and Andrea Giansanti . Temporal evolution and adaptation of sars-cov-2 codon usage . Frontiers in Bioscience-Landmark , 27 ( 1 ), January 2022 . ISSN 2768-6698 . doi: 10.31083/j.fbl2701013 . OpenUrl CrossRef 29. ↵ Xinkai Wu , Ke-jia Shan , Fuwen Zan , Xiaolu Tang , Zhaohui Qian , and Jian Lu . Optimization and deoptimization of codons in sars-cov-2 and related implications for vaccine development . Advanced Science , 10 ( 23 ), June 2023 . ISSN 2198-3844 . doi: 10.1002/advs.202205445 . OpenUrl CrossRef 30. ↵ Eric P Nawrocki . Faster sars-cov-2 sequence validation and annotation for genbank using vadr . NAR Genomics and Bioinformatics , 5 ( 1 ):qad002, 01 2023 . ISSN 2631-9268 . doi: 10.1093/nargab/lqad002 . OpenUrl CrossRef 31. ↵ Fan Wu , Su Zhao , Bin Yu , Yong-Mei Chen , Wei Wang , Zi-Gao Song , Ying-Ying Hu , Zheng-Wei Tao , Yu Huang , Ke Lan , et al. A new coronavirus associated with human respiratory disease in china . Nature , 579 ( 7798 ): 270 – 273 , 2020 . OpenUrl CrossRef PubMed 32. ↵ Xuechao Jia , Xinyu He , Chuntian Huang , Jian Li , Zigang Dong , and Kangdong Liu . Protein translation: biological processes and therapeutic strategies for human diseases . Signal Transduction and Targeted Therapy , 9 ( 1 ), February 2024 . ISSN 2059-3635 . doi: 10.1038/s41392-024-01749-9 . OpenUrl CrossRef 33. ↵ Anton A. Komar , Ekaterina Samatova , and Marina V. Rodnina . Translation rates and protein folding . Journal of Molecular Biology , 436 ( 14 ): 168384 , July 2024 . ISSN 0022-2836 . doi: 10.1016/j.jmb.2023.168384 . OpenUrl CrossRef 34. ↵ Vibhuti Kumar Shah , Priyanka Firmal , Aftab Alam , Dipyaman Ganguly , and Samit Chat-topadhyay . Overview of immune response during sars-cov-2 infection: Lessons from the past . Frontiers in Immunology , 11 , August 2020 . ISSN 1664-3224 . doi: 10.3389/fimmu.2020.01949 . OpenUrl CrossRef PubMed 35. ↵ Elham Torbati , Kurt L. Krause , and James E. Ussher . The immune response to sars-cov-2 and variants of concern . Viruses , 13 ( 10 ): 1911 , September 2021 . ISSN 1999-4915 . doi: 10.3390/v13101911 . OpenUrl CrossRef PubMed 36. ↵ Gareth M Jenkins and Edward C Holmes . The extent of codon usage bias in human rna viruses and its evolutionary origin . Virus Research , 92 ( 1 ): 1 – 7 , March 2003 . ISSN 0168-1702 . doi: 10.1016/s0168-1702(02)00309-x . OpenUrl CrossRef PubMed Web of Science 37. ↵ Wen Luo , Ayan Roy , Fucheng Guo , David M. Irwin , Xuejuan Shen , Junbin Pan , and Yongyi Shen . Host adaptation and evolutionary analysis of zaire ebolavirus: Insights from codon usage based investigations . Frontiers in Microbiology , 11 , November 2020 . ISSN 1664-302X . doi: 10.3389/fmicb.2020.570131 . OpenUrl CrossRef 38. ↵ Brayden G. Schindell , Meagan Allardice , Jessica A.M. McBride , Brendan Dennehy , and Jason Kindrachuk . Sars-cov-2 and the missing link of intermediate hosts in viral emergence - what we can learn from other betacoronaviruses . Frontiers in Virology , 2 , July 2022 . ISSN 2673-818X . doi: 10.3389/fviro.2022.875213 . OpenUrl CrossRef 39. Cedric C. S. Tan , Su Datt Lam , Damien Richard , Christopher J. Owen , Dorothea Berchtold , Christine Orengo , Meera Surendran Nair , Suresh V. Kuchipudi , Vivek Kapur , Lucy van Dorp , and François Balloux . Transmission of sars-cov-2 from humans to animals and potential host adaptation . Nature Communications , 13 ( 1 ), May 2022 . ISSN 2041-1723 . doi: 10.1038/s41467-022-30698-6 . OpenUrl CrossRef PubMed 40. ↵ Jonathan E. Pekar , Andrew Magee , Edyth Parker , Niema Moshiri , Katherine Izhikevich , Jennifer L. Havens , Karthik Gangavarapu , Lorena Mariana Malpica Serrano , Alexander Crits-Christoph , Nathaniel L. Matteson , Mark Zeller , Joshua I. Levy , Jade C. Wang , Scott Hughes , Jungmin Lee , Heedo Park , Man-Seong Park , Katherine Ching Zi Yan , Raymond Tzer Pin Lin , Mohd Noor Mat Isa , Yusuf Muhammad Noor , Tetyana I. Vasylyeva , Robert F. Garry , Edward C. Holmes , Andrew Rambaut , Marc A. Suchard , Kristian G. Andersen , Michael Worobey , and Joel O. Wertheim . The molecular epidemiology of multiple zoonotic origins of sars-cov-2 . Science , 377 ( 6609 ): 960 – 966 , August 2022 . ISSN 1095-9203 . doi: 10.1126/science.abp8337 . OpenUrl CrossRef PubMed 41. ↵ Feng Chen , Peng Wu , Shuyun Deng , Heng Zhang , Yutong Hou , Zheng Hu , Jianzhi Zhang , Xiaoshu Chen , and Jian-Rong Yang . Dissimilation of synonymous codon usage bias in virus–host coevolution due to translational selection . Nature Ecology amp; Evolution , 4 ( 4 ): 589 – 600 , March 2020 . ISSN 2397-334X . doi: 10.1038/s41559-020-1124-7 . OpenUrl CrossRef 42. ↵ Luciana A Castellano , Ryan J McNamara , Horacio M Pallarés, Andrea V Gamarnik , Diego E Alvarez , and Ariel A Bazzini . Dengue virus preferentially uses human and mosquito non-optimal codons . Molecular Systems Biology , 20 ( 10 ): 1085 – 1108 , July 2024 . ISSN 1744-4292 . doi: 10.1038/s44320-024-00052-7 . OpenUrl CrossRef PubMed 43. ↵ Alexander C. Keyel , Alexis Russell , Jonathan Plitnick , Jemma V. Rowlands , Daryl M. Lamson , Eli Rosenberg , and Kirsten St. George . Sars-cov-2 vaccine breakthrough by omicron and delta variants, new york, usa . Emerging Infectious Diseases , 28 ( 10 ), October 2022 . ISSN 1080-6059 . doi: 10.3201/eid2810.221058 . OpenUrl CrossRef 44. Robert S. Paton , Christopher E. Overton , and Thomas Ward . The rapid replacement of the sars-cov-2 delta variant by omicron (b.1.1.529) in england . Science translational medicine , 14 : eabo5395 , Jul 2022 . OpenUrl CrossRef PubMed 45. ↵ Elisa Robles-Escajeda , Jonathon E. Mohl , Lisett Contreras , Ana P. Betancourt , Bibiana M. Mancera , Robert A. Kirken , and Georgialina Rodriguez . Rapid shift from sars-cov-2 delta to omicron sub-variants within a dynamic southern u.s. borderplex . Viruses , 15 ( 3 ): 658 , February 2023 . ISSN 1999-4915 . doi: 10.3390/v15030658 . OpenUrl CrossRef PubMed 46. ↵ Edouard Mathieu , Hannah Ritchie , Lucas Rodés-Guirao , Cameron Appel , Daniel Gavrilov , Charlie Giattino , Joe Hasell , Bobbie Macdonald , Saloni Dattani , Diana Beltekian , Esteban Ortiz-Ospina , and Max Roser . Coronavirus (covid-19) vaccinations . Our World in Data , 2020 . https://ourworldindata.org/covid-vaccinations . 47. ↵ Igor M. Rouzine and Ganna Rozhnova . Evolutionary implications of sars-cov-2 vaccination for the future design of vaccination strategies . Communications Medicine , 3 ( 1 ), June 2023 . ISSN 2730-664X . doi: 10.1038/s43856-023-00320-x . OpenUrl CrossRef 48. ↵ Deepak Jena , Arup Ghosh , Atimukta Jha , Punit Prasad , and Sunil Kumar Raghav . Impact of vaccination on sars-cov-2 evolution and immune escape variants . Vaccine , 42 ( 21 ): 126153 , August 2024 . ISSN 0264-410X . doi: 10.1016/j.vaccine.2024.07.054 . OpenUrl CrossRef PubMed View the discussion thread. Back to top Previous Next Posted June 02, 2025. Download PDF Email Thank you for your interest in spreading the word about bioRxiv. NOTE: Your email address is requested solely to identify you as the sender of this article. Your Email * Your Name * Send To * Enter multiple addresses on separate lines or separate them with commas. You are going to email the following Diverse tendencies in codon usage evolution of SARS-CoV-2 genes Message Subject (Your Name) has forwarded a page to you from bioRxiv Message Body (Your Name) thought you would like to see this page from the bioRxiv website. Your Personal Message CAPTCHA This question is for testing whether or not you are a human visitor and to prevent automated spam submissions. Share Diverse tendencies in codon usage evolution of SARS-CoV-2 genes Paweł Błażej , Dorota Mackiewicz , Paweł Mackiewicz bioRxiv 2025.05.30.657095; doi: https://doi.org/10.1101/2025.05.30.657095 Share This Article: Copy Citation Tools Diverse tendencies in codon usage evolution of SARS-CoV-2 genes Paweł Błażej , Dorota Mackiewicz , Paweł Mackiewicz bioRxiv 2025.05.30.657095; doi: https://doi.org/10.1101/2025.05.30.657095 Citation Manager Formats BibTeX Bookends EasyBib EndNote (tagged) EndNote 8 (xml) Medlars Mendeley Papers RefWorks Tagged Ref Manager RIS Zotero Tweet Widget Facebook Like Google Plus One Subject Area Bioinformatics Subject Areas All Articles Animal Behavior and Cognition (7635) Biochemistry (17690) Bioengineering (13892) Bioinformatics (41935) Biophysics (21451) Cancer Biology (18587) Cell Biology (25499) Clinical Trials (138) Developmental Biology (13377) Ecology (19899) Epidemiology (2067) Evolutionary Biology (24318) Genetics (15609) Genomics (22506) Immunology (17736) Microbiology (40394) Molecular Biology (17181) Neuroscience (88601) Paleontology (666) Pathology (2832) Pharmacology and Toxicology (4824) Physiology (7641) Plant Biology (15152) Scientific Communication and Education (2045) Synthetic Biology (4294) Systems Biology (9825) Zoology (2271)
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.