Unraveling Rare Codon Bias in Actinomycetota: Lineage-Specific and 5’ Terminal Enrichment Across 1936 Genomes | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Unraveling Rare Codon Bias in Actinomycetota: Lineage-Specific and 5’ Terminal Enrichment Across 1936 Genomes Anna Rudenko, Thomas J. Booth, Sam E. Williams, Kai Blin, Hyun Uk Kim, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7996976/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 05 Mar, 2026 Read the published version in BMC Genomics → Version 1 posted 11 You are reading this latest preprint version Abstract Background Actinomycetota are a diverse phylum of major ecological, medical, and industrial importance, best known for producing antibiotics and other secondary metabolites. While some regulatory mechanisms of secondary metabolism are understood, many remain unresolved. Codon usage bias, the preferential use of synonymous codons, represents one potential layer of regulation, as it is known to influence translation efficiency and timing. The availability of thousands of high-quality genomes now enables codon usage to be examined across this phylum at unprecedented scale. Results We analyzed codon usage across 1936 high- and medium-quality genomes from 11 genera. The most common codon across the dataset was GCC, particularly enriched in Streptomyces albidoflavus . In contrast, TTA was consistently rare and showed variable distribution across genera. AGA was identified as another rare codon with especially strong enrichment at 5′ termini. Both TTA and AGA were enriched in functional categories such as replication, transcription, and secondary metabolism, and were significantly overrepresented in biosynthetic gene clusters, particularly within biosynthetic and regulatory genes. Conclusions These results show that rare codon usage in Actinomycetota reflects both evolutionary history and nonrandom positional enrichment, particularly at 5′ termini, where it may fine-tune translation timing. This positional bias likely represents a conserved mechanism for coordinating gene expression. Beyond providing biological insight, our findings highlight the practical value of codon analysis for synthetic biology, metabolic engineering, and efforts to optimize the expression of biosynthetic gene clusters. Codon Usage Bias Actinomycetota Secondary Metabolism Gene Regulation Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Background Actinomycetota, formerly referred to as Actinobacteria, is a large phylum of gram-positive bacteria that are widely distributed in the environment and play key roles in soil ecosystems [ 1 – 5 ]. Many Actinomycetota have large genomes, often ranging from 8 to 10 Mb, encoding remarkable metabolic versatility and complex regulatory networks [ 6 ]. Within Actinomycetota, the genus Streptomyces is renowned for its ability to produce a wide range of bioactive secondary/specialized metabolites, including antibiotics, antifungals, immunosuppressants and antitumor agents [ 7 , 8 ]. These natural products have a profound impact on medicine and agriculture, yet the biosynthetic potential of many Actinomycetota remains untapped. A significant obstacle in harnessing their full potential lies in the poor expression of biosynthetic gene clusters (BGCs) responsible for the production of secondary metabolites under laboratory conditions [ 9 ]. A deeper understanding of the regulatory and genetic mechanisms controlling secondary metabolism would therefore benefit efforts to discover new secondary metabolites from Actinomycetota. Codon usage bias (CUB) refers to the preferential use of certain synonymous codons over others. Codon usage significantly impacts gene expression, protein folding, and fitness by modulating ribosome speed and tRNA availability [ 10 , 11 ]. Codons that match abundant tRNAs are often associated with efficient translation and high expression of essential genes [ 12 ], while rare codons can introduce translational bottlenecks or act as regulatory elements. CUB is shaped by mutational bias, translational selection, and regulatory constraints [ 13 ], and serves as an active regulatory layer, particularly when specific codons depend on conditionally expressed tRNAs [ 14 , 15 ]. In Streptomyces , the impact of codon usage bias is exemplified by the rare leucine codon TTA. TTA is the rarest codon in Streptomyces genomes [ 16 ], occurring in only 2–3% of genes [ 2 ], and plays a key role in regulating gene expression. The scarcity of this codon reflects its exclusive dependence on a single specialized tRNA, \(\:{\text{t}\text{R}\text{N}\text{A}}_{\text{U}\text{A}\text{A}}^{\text{L}\text{e}\text{u}}\) , encoded by the bldA gene [ 14 , 15 ]. First characterized in Streptomyces coelicolor , bldA is not constitutively expressed but is induced under specific developmental or environmental conditions, typically during later stages of the life cycle [ 17 ]. One example of a gene regulated by TTA usage is adpA , which is a global transcriptional regulator essential for development and the onset of secondary metabolism in Streptomyces [ 18 ]. Its translation depends on a single TTA codon, making it strictly reliant on bldA . Once translated, AdpA activates genes of BGCs as well as those involved in morphological differentiation [ 19 , 20 ]. In Streptomyces albidoflavus J1074 (formerly Streptomyces albus ), a widely used host for heterologous expression, deletion of bldA abolishes the production of both native and heterologously expressed antibiotics [ 21 ]. Similarly, in Streptomyces tsukubaensis , severe ribosome pausing at a TTA codon within the FK506 BGC creates a translational bottleneck, underscoring the regulatory significance of rare codons during secondary metabolism [ 22 ]. The location of a given codon within a gene also affects translation. Rare codons at the start of coding sequences can reduce translation efficiency by slowing ribosome initiation and altering expression timing [ 10 , 23 , 24 ]. These effects are especially relevant for codons such as TTA, which is known to delay translation when located near the start of transcripts [ 25 ]. Codons located mid-gene may impact folding, while those at the end can influence ribosome release and final folding [ 26 ]. The positioning of rare codons - particularly TTA - appears to contribute to the regulation of gene expression, enabling precise temporal control of developmental and stress-responsive genes in response to environmental conditions [ 14 , 15 ]. While current knowledge about the TTA codon and codon usage bias is derived from a limited number of Streptomyces genomes, the increasing availability of high-quality sequences across the phylum allows for broader investigation. In this study, we analyze 1936 Actinomycetota genomes across 11 genera to assess the distribution and positional enrichment of codons. In addition to TTA, we also examine the usage and positional biases of other rare codons and their potential roles in translation regulation. We explore whether the regulatory role of TTA, especially its association with secondary metabolism, is conserved across the phylum or represents a lineage specific adaptation in Streptomyces . By analyzing codon usage patterns at scale, we provide new insights into the evolutionary and functional dynamics of codon bias in Actinomycetota and highlight potential strategies for manipulating gene expression in industrial and synthetic biology applications. Results and Discussion Codon usage patterns across Actinomycetota highlight GCC enrichment in Streptomyces albidoflavus To explore codon usage bias (CUB) across different genera of Actinomycetota, we calculated codon frequencies for all coding sequences from 1936 genomes as relative usage per strain. This dataset comprises 1448 high-quality genomes (NCBI complete/chromosome level) and 488 medium-quality genomes (CheckM completeness > 90%, contamination < 5%, < 50 contigs). The genomes used in this study and metadata are listed in Additional file 1, Table S1 . To visualize usage patterns, a heatmap was generated and ordered using hierarchical clustering (Fig. 1 A). The clustering captures distinct genus-specific CUBs with clear groupings observed for Micromonospora , Kribbella and Nocardia . The tight clustering of these genera reflects highly conserved CUB within each group. In contrast, clustering for other genera such as Streptomyces , Rhodococcus and Kitasatospora are more variable. GC-rich codons such as GCC, CTG, GGC, GAC and GCG, which encode alanine, leucine, glycine, aspartate, and alanine, respectively, were the most frequently used across all strains. This pattern reflects the high GC content of most Actinomycetota and supports previous findings that GC content is a dominant force shaping codon usage bias in bacteria [ 27 ]. In contrast, the rarest codons were TTA, CTA, ATA, AGA and TTT, which encode leucine, leucine, isoleucine, arginine, and phenylalanine, respectively. Among these, TTA was the least used among all genomes. As the only leucine codon lacking a G or C, TTA is particularly notable for its well-established regulatory role in Streptomyces, for which translation of TTA-containing genes depends on the developmentally controlled bldA tRNA [ 14 ]. Streptomyces albidoflavus and several strains of Kitasatospora showed a marked enrichment of the GCC codon (Fig. 1 A red square, Fig. 1 B). S. albidoflavus has a relatively compact genome (~ 6.8–7.2 Mb), smaller than the 8–9 Mb genomes typical of most Streptomyces [ 28 – 29 ]. These two features, an elevated GCC percentage in a compact genome, appears to be a unique feature of S. albidoflavus. In our dataset, there was no correlation between genome size and GCC usage (R 2 = 0.02, Fig. 1 C). The elevated GCC usage might reflect selective pressure to optimize translational efficiency by reducing reliance on rare codon-specific tRNAs, potentially contributing to the strain’s robustness and reliability as a chassis for heterologous expression [ 30 – 32 ]. Phylogenetic and genus-level distribution of TTA codon usage Because of its established regulatory role in several Streptomyces strains and to explore its potential evolutionary significance, we investigated the distribution of the TTA codon across the entire dataset. First, we quantified the frequency of the TTA codon across all genomes by calculating the percentage of genes containing at least one TTA codon for each strain. To place these patterns in an evolutionary context, we reconstructed a species tree using getphylo [ 33 ] (Fig. 2 ) and rooted it at the most recent common ancestor (MRCA) of the genus Kribbella based on prior phylogenomic studies [ 34 ]. We then assessed the emergence and distribution of TTA codon enrichment as a potential evolutionary development across lineages. Across all genomes, the percentage of genes containing the TTA codon ranged from as low as 0.5% to as high as 30.1%, highlighting substantial variation in the prevalence of this rare codon (Fig. 3 A). The Nocardiaceae genera Nocardia and Rhodococcus exhibited both the highest median percentage of TTA-containing genes and the greatest variability across strains, with medians of 8.9% and 8.4% and ranges of 2.7%–30.1% and 1.8%–22.0%, respectively. This combination of elevated medians and broad distributions suggests a complex evolutionary dynamic. Phylogenetic analysis indicates that enrichment of TTA-containing genes likely arose independently in multiple lineages, including within the Nocardiaceae family. Other genera may share a similar evolutionary trajectory, although current small sample sizes for some genera may be insufficient to fully resolve these patterns. This observation supports the idea that elevated TTA codon usage may reflect adaptive responses to lineage-specific ecological and regulatory demands. In Streptomyces , the rare TTA codon is decoded by the bldA -encoded tRNA, which plays a key role in controlling developmental processes such as sporulation and the activation of secondary metabolism [ 2 ]. In contrast, Nocardia and Rhodococcus are non-sporulating genera [ 35 , 36 ], and thus lack this developmental context for TTA regulation. In these taxa, high TTA usage is likely to be associated with alternative regulatory roles. Although the exact functions remain unknown, they may include stress response regulation, metabolic versatility, or control of horizontally acquired genes. Such lineage-specific differences in codon function are consistent with previous observations that codon usage patterns in Actinomycetota can be shaped by ecological and evolutionary pressures [ 37 ]. In contrast, a much lower percentage of genes in Streptomyces contain the TTA codon, ranging from 0.8%–6% with a median of 2.4%, indicating a more constrained pattern across strains. Kitasatospora (0.5%–2.5%, median 1.5%) and Pseudonocardia (0.8%–1.6%, median 1.2%) showed the lowest median values with minimal variability. The narrower distributions observed in these genera suggest that TTA codon usage is under stronger stabilizing constraints or has remained relatively unchanged over extended evolutionary periods, potentially reflecting conserved regulatory strategies or reduced selective pressures for TTA codon expansion. The percentage of genes containing TTA in Actinokineospora , Kribella, Saccharopolyspora , Amycolatopsis , Saccharothrix , and Micromonospora fell in the middle. For example, in Actinokineospora , 1.8%–11.4% of genes contained at least one TTA codon with a relatively high median of 8.3%, whereas in Kribbella , the range was 2.5%–13.1% with a median of 5.7%. Although Kribbella was used as the root of the phylogenetic tree, its moderate and variable TTA percentage suggests that TTA codon usage underwent repeated, lineage-specific gains and losses in multiple clades since the divergence of Kribbella . Despite these differences among genera, TTA remained the rarest codon in nearly all strains analyzed with only three exceptions: one Kribbella strain where ATA (isoleucine) was rarer, one Nocardia strain where both ATA and AGA (arginine) were rarer, and one Saccharopolyspora strain where ATA was rarer (indicated by stars in Fig. 2 ). Next, we examined the relationship between the proportion of genes that contain the rare TTA codon and the overall TTA codon usage per genome (Fig. 3 B). Across all genomes, this relationship was highly linear ( r = 0.99), reflecting that TTA codon usage is broadly distributed among the genes in a genome rather than concentrated in outliers. When analyzed at the genus level (Additional file 2: Fig. S1 ), this pattern remained consistent in most taxa with Pearson correlation coefficients ranging from r = 0.96 to r = 1.00. A notable exception was Streptomyces , which exhibited a weaker correlation ( r = 0.74), indicating greater heterogeneity in the number of TTA codons per gene. This deviation was driven by a single outlier genome, Streptomyces sp. f51 , which carried a hypothetic plasmid sequence consisting of low complexity ORFs containing hundreds of TTA codons. These sequences artificially inflated rare-codon counts relative to the proportion of TTA-positive genes. Removing this genome restored the correlation to r = 0.99 (Additional file 2: Fig. S2 ), consistent with the other genera. To test whether overall base composition shapes rare codon usage, we examined how genome-wide GC content correlates with the proportion of genes containing at least one TTA codon (Fig. 3 C). Across all genomes, a strong inverse correlation was observed ( r = − 0.85), indicating that genomes with lower GC content tend to harbor a higher fraction of genes containing the AT-rich TTA codon. Genus-level analysis (Additional file 2: Fig. S3 ) showed that this pattern is largely conserved across the genera, with individual genera exhibiting correlation values ranging from r = − 0.90 in Micromonospora to r = − 0.36 in Streptomyces . The relatively weak correlation in Streptomyces , despite its characteristically high GC content, suggests that TTA codon presence in this genus may also be maintained by conserved regulatory functions, particularly those tied to bldA -mediated control of key developmental and biosynthetic genes, rather than being solely explained by genomic composition. Rhodococcus and Nocardia , where strong GC–TTA correlations exist and TTA codon usage is high, may use the TTA codon less prominently as a dedicated regulatory element, with its occurrence more strongly influenced by local sequence context or overall nucleotide composition. Nevertheless, given that these organisms still encode only a single dedicated \(\:{\text{t}\text{R}\text{N}\text{A}}_{\text{U}\text{A}\text{A}}^{\text{L}\text{e}\text{u}}\) (Additional file 3: Table S2 ), TTA codon usage may remain relevant at the translational level, potentially affecting translation efficiency or timing in specific genes despite the absence of a broader regulatory mechanism. Rare codons are enriched at the 5' termini of genes As noted in the introduction, the position of codons within a gene can influence translation initiation and protein folding. We therefore analyzed the positional distribution of all codons. For comparability across genes of varying lengths, each coding sequence was divided into ten equal-sized segments (bins 0–9) across the entire length of the gene. Bin 0 corresponds to the bin closest to the 5' terminus while bin 9 corresponds to the bin closest to the 3' terminus. For each codon, we calculated the number of times it fell within each bin relative to its total count in the gene. Codon rarity was defined as the square root transformation of the inverse of the total codon counts across all positional bins, normalized to range from 0 (least rare) to 1 (most rare). This transformation reduces the skew caused by highly abundant codons and provides a more balanced scale for comparing codons of different overall frequencies. To quantify the relationship between codon rarity and positional enrichment, we calculated both Spearman and Pearson correlations (Additional file 3: Tables S3 and S4) between codon rarity and normalized frequencies in each bin. Using this rarity metric, we next examined how codons were distributed along genes. Across all genera, most codons with higher usage were distributed relatively uniformly across the entire gene length. In contrast, rarer codons exhibited a marked enrichment in bin 0 and a weaker enrichment in bin 9 (Fig. 4 ). While both correlation measures showed similar trends, Spearman coefficients were prioritized for interpretation because they do not assume a linear relationship between variables (Additional file 3: Tables S3 and S4). Across all genera, the strongest positive associations between codon rarity and positional frequency occurred in bin 0 (Spearman = 0.69–0.84), confirming the pronounced enrichment of rare codons at the 5′ terminus (Fig. 5). In contrast, bins 1–8 showed a marked depletion of rare codons, reflected in consistently negative correlations (typically − 0.5 to − 0.8), with the most pronounced values in bins 3–7. This pattern was consistent across genera, interrupted only by a few instances of negative correlations close to 0 (e.g., bin 1 in Streptomyces and Kitasatospora ). Correlations increased again in bin 9 (0.42–0.79), indicating a secondary but weaker enrichment at the 3′ terminus (Additional File 2: Fig. S4). Together, these results reveal a U-shaped positional bias: rare codons preferentially cluster at gene ends and are underrepresented in the central coding region. This positional bias in rare codons suggests they may play regulatory roles at gene termini, influencing translation dynamics or timing. We next focused specifically on the TTA and AGA codons. We included AGA because, like TTA, it is a rare codon and showed strong enrichment at the 5' termini in several genera (Fig. 4 ). Heatmaps of TTA codon frequencies across bins revealed genus-specific patterns (Fig. 6 A). Strong enrichment of TTA at the 5' termini was observed in Streptomyces , Kitasatospora and Saccharothrix , consistent with known regulatory roles of TTA in Streptomyces . In these genera, TTA likely acts as a translational checkpoint regulated by bldA [ 2 , 38 ]. Mechanistically, 5' termini enrichment of rare codons such as TTA may delay early elongation when the corresponding tRNA is scarce, thereby restricting translation until specific developmental or environmental signals induce bldA expression. In contrast, genera such as Nocardia , Rhodococcus and Pseudonocardia exhibited a more uniform internal distribution of TTA codons despite their relatively high percentage of TTA-containing genes. This pattern suggests that, in these genera, TTA may play a regulatory role through its overall prevalence rather than through preferential positioning at specific gene termini. Other genera, including Saccharopolyspora , Actinokineospora , Amycolatopsis and Kribbella , displayed moderate enrichment at the 5' end but less pronounced than in Streptomyces . A weaker enrichment of TTA was also noted at the 3' end in certain genera, for example Pseudonocardia , Amycolatopsis and Kribbella , while others such as Rhodococcus showed no enrichment at either terminus. Rare codon-induced pausing near stop codons has been associated with changes in transcript longevity or ribosome release [ 39 ]. Figure 5 Pearson and Spearman correlation between normalized codon rarity and bin 0 (gene start) enrichment across genera. Each panel corresponds to a different genus. The x-axis indicates normalized rarity (0–1, with 1 being rarest); the y-axis represents the proportion of each codon found in bin 0. Dashed lines indicate linear regression fits based on Pearson correlation. The five most frequently and five least frequently used codons are labelled. A similar analysis for the AGA codon showed enrichment patterns that mirrored those of TTA in several genera (Fig. 6 C). Strong 5' enrichment of the AGA codon was observed in Micromonospora , Saccharothrix , Kitasatospora , Amycolatopsis and Streptomyces , suggesting that AGA might also contribute to regulation of translation initiation. Although AGA is not decoded by the bldA -encoded tRNA and instead relies on a distinct tRNA pool, its positional bias indicates it may serve a similar regulatory function. Increasing the availability of tRNA Arg could potentially enhance the translation of AGA-rich genes, providing a mechanism for modulating their expression. Since a fraction of these AGA-rich genes are located within BGCs (Additional file 2: Fig. S5), changes in tRNA availability could affect secondary metabolism. In Saccharopolyspora , Kribbella and Pseudonocardia , AGA showed modest enrichment at both termini. To complement the positional analysis and assess potential biases introduced by gene length normalization, we also quantified the frequency of TTA and AGA codons within the first and last 10 amino acids of coding sequences (Fig. 6 B, D). This analysis confirmed the positional analysis findings: both TTA and AGA codons were preferentially localized at the 5' end in several genera, particularly Streptomyces , Kitasatospora and Saccharothrix . TTA and AGA codons are enriched in the COG categories Replication, Recombination and Repair and Transcription We next examined the functional categories associated with genes containing rare codons based on Clusters of Orthologous Groups (COG) annotations. Annotations were assigned by eggNOG-mapper version 2 [ 40 ] based on eggNOG database version 5 [ 41 ] and COG classifications [ 42 ]. For TTA and AGA we defined two gene sets: (i) genes containing at least one instance of the codon anywhere in the coding sequence (gene-wide set), and (ii) genes in which the codon occurs within the first 10% of the coding sequence (5’ terminus set). Within each strain, we investigated whether TTA and AGA codons were overrepresented in any COG category relative to the genomic background using a hypergeometric test. Significance was assessed at p < 0.05. To compare across taxa, we report for each genus the proportion of strains showing significant enrichment in each COG category. The results are provided in Additional file 3, Tables S5-S8 (gene-wide: S5, S7; 5’ termini: S6, S8) For the gene-wide set, TTA codons were most often enriched in categories related to Replication, Recombination and Repair, with very high frequencies in Nocardia (90%), Amycolatopsis (87%) and Pseudonocardia (86%) (Additional file 3: Table S5). Enrichment in Transcription was also seen in Nocardia (68%), Rhodococcus (59%) and Streptomyces (58%). The category Secondary Metabolism was enriched in several genera, for example Actinokineospora (73%) and Nocardia (55%). Although TTA is known to affect secondary metabolism in Streptomyces , only 25% of Streptomyces strains showed enrichment of TTA within this category. A survey of 213 Streptomyces genomes reported a similar trend for the TTA codon: enrichment in Replication, Recombination and Repair, Transcription but also Secondary Metabolism [ 43 ]. Similarly, AGA showed strong enrichment in Replication, Recombination and Repair (e.g., Kitasatospora , 97%; Streptomyces , 94%) and Transcription (e.g., Rhodococcus , 90%; Nocardia , 90%) (Additional file 3: Table S7). However, unlike TTA, AGA enrichment was detected in a higher proportion of strains across multiple genera, indicating that its functional associations in these two categories is more broadly conserved. In Secondary Metabolism, AGA showed lower enrichment than for TTA: the average among all genera was 29% for TTA versus 22% for AGA. Although enrichment of AGA in Secondary Metabolism was less pronounced than that observed for TTA it was still enriched in several genera, particularly Nocardia (67%) and Rhodococcus (43%). For the TTA 5’ terminus set (Additional file 3: Table S6), Cell Wall, Membrane and Envelope Biogenesis was the most frequently enriched COG category across strains. This effect was pronounced in Kribbella (95%) and Actinokineospora (91%). The Secondary Metabolism category also showed high 5' terminus enrichment with all Saccharothrix and 74% of Kitasatospora strains showing enrichment. In contrast, analysis of the AGA codon (Additional file 3: Table S8) revealed 5’ terminus enrichment in nearly all strains spanning many genera and functional categories. For example, genes related to Amino Acid and Carbohydrate Metabolism, Cell Wall Biogenesis and Secondary Metabolism were enriched in > 90% of strains in nearly all genera. Although TTA and AGA differ in the proportion of strains showing enrichment in their genus-level distribution, they occur in largely overlapping functional categories. Both codons are consistently enriched in Replication, Recombination and Repair, Transcription and Secondary Metabolism, with AGA showing a particularly strong signal in Transcription. Enrichment at the 5′ terminus suggests that both codons contribute to translational control, complementing transcriptional regulation. Notably, genus-specific patterns (e.g., gene-wide AGA enrichment in Rhodococcus and Nocardia , but absence in Actinokineospora and Amycolatopsis ) highlight possible lineage-specific regulatory roles. antiSMASH based BGC annotation reveals greater AGA enrichment compared to TTA In addition to COG, we employed antiSMASH 8.0 to predict the BGCs within each strain and to obtain more detailed information about the genes within the BGCs [ 44 ]. We then examined the occurrence of TTA and AGA codons within BGCs. For each strain, we first calculated the proportion of BGCs containing at least one gene with TTA or AGA, representing the overall prevalence of these two codons at the BGC level (Fig. S5). TTA codons (Fig. S5A) were found in BGCs of nearly all genera but at varying frequency. High prevalence was observed in Nocardia , Rhodococcus and Kribbella and lower prevalence in Kitasatospora and Pseudonocardia . In contrast, AGA codons (Fig. S5B) showed less variability, displaying a more uniform and generally higher prevalence than TTA. To assess whether BGCs contain more rare codons than expected given the genomic background, we applied a hypergeometric test to each genome (Fig. 7 A). Both TTA and AGA codons were significantly overrepresented in BGCs, but the proportion of strains with significant enrichment differed between each genus. AGA enrichment was highest in Kribbella (91% of strains) and Actinokineospora (82%), whereas TTA was highest in Streptomyces (71%) and Kitasatospora (70%). For comparison, when using COG-based annotations the enrichment was detected in only 25% of Streptomyces genomes, highlighting that antiSMASH provides a more accurate framework for studying secondary metabolism. Some genera displayed opposite codon preferences: in Kribbella , AGA was highly enriched (91%) while TTA was much less frequent (28%), whereas in Streptomyces TTA enrichment (71%) exceeded AGA (57%). The results differed markedly depending on whether enrichment in secondary metabolism genes was assessed using COG or antiSMASH. With COG, enrichment was detected in an average of 29% of strains among all genera for TTA and 22% for AGA, whereas antiSMASH based annotations yielded higher values of 37% for TTA and 51% for AGA. This discrepancy highlights that COG functional classification does not fully capture BGC information, and that specialized tools such as antiSMASH provide a more accurate representation of secondary metabolism. Notably, while COG-based analyses suggested that TTA was more strongly enriched, antiSMASH results indicate that AGA enrichment is even more pronounced, underscoring its potential importance in the regulation of secondary metabolism. Values for AGA and TTA enrichment across genera are provided in Additional file 3, Table S9. To resolve which functional gene categories within BGCs drive these signals, we applied Fisher’s exact test for each genome, comparing codon presence versus absence in each antiSMASH annotation category (biosynthetic, biosynthetic-additional, regulatory, resistance, transport, and others). For both codons, biosynthetic genes showed the highest overall enrichment (Fig. 7 B–C). Among biosynthetic genes, AGA showed the highest enrichment in Nocardia (81% of strains) and Rhodococcus (78%). TTA was also enriched in these genera, but generally at lower frequencies; in Nocardia , however, TTA reached levels comparable to AGA. Regulatory genes displayed a distinct pattern: TTA codons were frequently enriched in this category, with up to 45% of Kitasatospora and 43% of Streptomyces strains showing significance, whereas AGA codons were less often enriched in regulatory genes in these taxa. However, in other genera, including Micromonospora , Amycolatopsis and Actinokineospora , AGA enrichment in regulatory genes exceeded that of TTA. Together, these results demonstrate that rare codons are not randomly distributed within BGCs, with both genus-specific and functional-category–specific patterns. TTA enrichment in regulatory genes supports the established model in which rare codons act as translational checkpoints to control BGC activation, particularly in Streptomyces and Kitasatospora . At the same time, the consistent presence of both codons in biosynthetic core genes indicates an additional role in tuning enzyme expression, potentially affecting pathway efficiency, flux, and timing. Importantly, the enrichment of AGA codons in regulatory genes in several genera highlights that codon-mediated regulation is more diverse than previously recognized and not restricted to TTA, contributing to the regulation and fine-tuning of BGC expression across Actinomycetota. Such lineage-specific reliance on codon usage could help explain why BGC activation varies widely across taxa and why many natural products remain cryptic under standard laboratory conditions. Conclusion This study presents a systematic analysis of codon usage patterns in 1936 high- and medium-quality Actinomycetota genomes across 11 genera. We show that the proportion of TTA-containing genes is strongly associated with genomic GC content, with lower GC genomes generally carrying more TTA codons. Streptomyces is a notable exception—despite its high GC content, it retains TTA codons in key regulatory genes, reflecting preservation for specific, functional usage rather than compositional bias. Beyond compositional trends, we observed a clear overall pattern that rarer codons are preferentially enriched at the 5' termini of genes. Both TTA and AGA codons were most consistently concentrated within the first 10% of coding sequences (bin 0), highlighting this region as a hotspot for translational control. Within this shared trend, codon-specific differences emerged: AGA codons showed enrichment at gene starts across multiple genera, suggesting a broadly conserved role in modulating translation initiation. In contrast, TTA codon enrichment at the 5' end was more genus-specific, with strong signals in taxa such as Kitasatospora , Saccharothrix and Streptomyces but weaker in Nocardia and Rhodococcus . Functional enrichment analysis using COG categories further revealed that both TTA and AGA codons are associated with cellular processes such as Replication, Recombination and Repair, Transcription, Secondary Metabolism and Cell Wall Biogenesis. While the two codons often overlapped in enriched categories, their strengths and distributions differed. This observation suggests that AGA, like TTA, may play a broader role in regulatory tuning across Actinomycetota. Within BGCs, both codons were significantly overrepresented compared to the genomic background, with the strongest enrichment observed in biosynthetic genes, followed by regulatory genes. Overall, AGA codons tended to show stronger enrichment in biosynthetic genes, particularly in genera such as Rhodococcus and Amycolatopsis , whereas in Nocardia both codons reached similarly high levels. In contrast, TTA codons were more frequently enriched in Streptomyces and Kitasatospora , supporting their role as translational checkpoints in these taxa. In several other genera, including Micromonospora , Actinokineospora , and Nocardia , enrichment of AGA codons exceeded that of TTA in regulatory genes of BGCs. Together, these patterns indicate that both codons contribute to the fine-tuning of secondary metabolism. This work lays the foundation for several future directions. Experimental validation using ribosome profiling or codon-specific reporter constructs could confirm the functional consequences of TTA and AGA enrichment at the 5' and 3' termini. Engineering strategies for heterologous expression of biosynthetic gene clusters may benefit from codon optimization approaches that account for both positional and phylogenetic codon usage biases. Finally, integrating codon usage with analyses of horizontal gene transfer could clarify how rare codons contribute to genome plasticity and the modular regulation of secondary metabolism. In practical terms, when expressing BGCs, either in their native host or in a heterologous chassis, special attention should be given to rare codons within the first 10% of the coding sequence, as this region is most likely to influence translation initiation and overall expression efficiency. Methods Data collection and quality check Genome sequences for 11 genera within the Actinomycetota phylum were downloaded from the NCBI RefSeq database on November 13, 2024. To ensure data quality and taxonomic consistency, several filtering steps were applied. First, genomes with failed Average Nucleotide Identity (ANI) checks of taxonomy were removed from the dataset. Subsequently, the genomes were categorized into three quality groups based on assembly completeness, contamination levels, and the number of contigs. High-quality (HQ) genomes included those annotated as complete or at the chromosome level by NCBI. Medium-quality (MQ) genomes were defined as those with completeness > 90%, contamination < 5% based on CheckM version 1.2.2 [ 45 ], and fewer than 50 contigs. Low-quality (LQ) genomes, which did not meet these criteria, were excluded from further analysis. Genera were selected for downstream analysis based on the availability of at least 10 high- and/or medium-quality genomes, ensuring robust comparative analyses (Table 1 ). For the genus Streptomyces , only high-quality genomes were used since this genus was highly overrepresented among Actinomycetota. Following these filters, the final dataset comprised 1936 genomes. The corresponding accession IDs and metadata for all included genomes are provided in Additional file 1, Table S1 . Table 1 Summary of genomes included in this study. HQ: High Quality, MQ: Medium Quality Genus HQ MQ Total Actinokineospora 1 10 11 Amycolatopsis 26 5034 60 Kitasatospora 37 7740 77 Kribella 11 46 57 Micromonospora 71 113 184 Nocardia 36 127 163 Pseudonocardia 1 13 14 Rhodococcus 108 78 186 Saccharopolyspora 16 6 22 Saccharothrix 4 11 15 Streptomyces 1147 None 1147 Representative genome selection using MASH To identify representative genomes among duplicates, pairwise genomic distances were calculated using MASH [ 46 ], and genomes with distances below a specified threshold (0.001) were considered duplicates. A graph-based approach was employed where each genome was represented as a node, and edges were drawn between nodes with pairwise distances below the threshold. The Louvain community detection algorithm [ 47 ] was applied to group genomes into clusters based on their connectivity within the graph. For each cluster, a representative genome was selected. In clusters containing only one genome, that genome was directly chosen as the representative. For clusters with multiple genomes, the genome with the smallest average pairwise distance to all others in the cluster was selected, ensuring that the representative genome was the most central and typical within the group. Phylogenetic tree construction using Getphylo We used the getphylo pipeline to construct a phylogenetic tree of the analyzed genomes based on 23 conserved single copy loci [ 33 ] and visualized the resulting tree using the Interactive Tree of Life (iTOL) [ 48 ]. The tree was rooted at the most recent common ancestor of the genus Kribbella , which was selected based on its phylogenetic placement as a basal lineage in Actinomycetota [ 34 ] and to enable clearer interpretation of codon usage evolution across derived clades. Codon usage data extraction Annotated genomic data were processed to extract codon frequencies and their positional distribution within genes. The coding DNA sequences (CDSs) were retrieved from the GenBank files of the curated genomes. Relative codon positions were calculated by dividing the CDS length into 10 equal bins (referred to as position bins 0–9), with bin 0 representing the 5’ start of the gene and bin 9 representing the 3’ end. Codon counts for each positional bin were computed for all CDSs in each genome. Codon frequency normalization was performed by calculating the proportion of each codon within its respective bin relative to the total codon counts across all bins for each genus. The number of \(\:{\text{t}\text{R}\text{N}\text{A}}_{\text{U}\text{A}\text{A}}^{\text{L}\text{e}\text{u}}\) , and \(\:{\text{t}\text{R}\text{N}\text{A}}_{\text{U}\text{C}\text{U}\:}^{\text{A}\text{r}\text{g}}\) genes, responsible for decoding the rare codons TTA and AGA respectively, was determined by parsing GenBank files for each strain and identifying tRNA features annotated with the corresponding anticodons in the anticodon qualifiers. Codon rarity calculation Codon rarity was calculated to quantify the infrequency of each codon across all positional bins for each genus. First, the total counts of each codon across all bins were counted. Rarity was then calculated using a square root transformation of the inverse of the total codon counts: $$\:\text{R}\text{a}\text{r}\text{i}\text{t}\text{y}=\:\sqrt{\frac{1}{\text{T}\text{o}\text{t}\text{a}\text{l}\:\text{C}\text{o}\text{u}\text{n}\text{t}\text{s}}}$$ The square root transformation was applied to reduce the skewness inherent in the distribution of codon counts, thereby emphasizing differences between rarer codons while maintaining the overall rarity trend. While the transformation may not substantially alter correlation outcomes compared to raw codon frequencies, it helps normalize extreme count differences and improve interpretability across codons with very different usage levels. To enable comparability across genera, min-max normalization was subsequently applied: $$\:\text{N}\text{o}\text{r}\text{m}\text{a}\text{l}\text{i}\text{z}\text{e}\text{d}\:\text{R}\text{a}\text{r}\text{i}\text{t}\text{y}=\:\frac{\text{R}\text{a}\text{r}\text{i}\text{t}\text{y}-\text{m}\text{i}\text{n}\left(\text{R}\text{a}\text{r}\text{i}\text{t}\text{y}\right)}{\text{max}\left(\text{R}\text{a}\text{r}\text{i}\text{t}\text{y}\right)-\text{m}\text{i}\text{n}\left(\text{R}\text{a}\text{r}\text{i}\text{t}\text{y}\right)}$$ This normalization scaled the rarity values between 0 (least rare codon) and 1 (most rare codon). The normalized rarity values were subsequently used in all correlation analyses and visualizations to explore positional biases and relationships with codon enrichment across the genome. Correlation analysis between codon rarity and positional enrichment Pearson and Spearman correlation coefficients were calculated between the normalized codon rarity values and their respective frequencies in each position bin (0–9). Correlation coefficients were computed separately for each genus. The correlations were visualized through scatter plots with trendlines, highlighting the enrichment of rare codons in specific bins. Correlation trends across bins were summarized for all genera in tabular format, and genus-specific patterns were further analyzed for notable differences in codon behavior. Codon enrichment analyses in functional categories and BGCs Gene products were assigned to functional categories using the eggNOG-mapper software version 2 [ 40 ], based on the EggNOG database version 5 [ 41 ] and COG categories [ 42 ]. To assess the statistical enrichment of genes containing rare codons, we employed hypergeometric tests separately for TTA and AGA codons. These tests determine whether codon-containing genes are overrepresented in specific COG categories compared to the overall gene distribution in each genome. For each strain, the probability of observing at least k codon-containing genes in each COG category was computed using the survival function of the hypergeometric distribution: $$\:P(X\ge\:k)=1-\sum\:_{i=0}^{k-1}\frac{\left(\begin{array}{c}K\\\:i\end{array}\right)\left(\begin{array}{c}N-K\\\:n-i\end{array}\right)}{\left(\begin{array}{c}N\\\:n\end{array}\right)}$$ where: N is the total number of genes in the genome, K is the total number of genes assigned to the given COG category, N is the total number of TTA- or AGA-containing genes in the genome, k is the number of TTA- or AGA-containing genes observed in the COG category. This test was applied separately to each genome, ensuring that the results reflect the biological variability across different strains. Categories with a p -value below 0.05 were considered significantly enriched for TTA- or AGA-containing genes. To assess whether rare codons are preferentially located at the beginning of genes within specific functional categories, a separate hypergeometric test was performed using only genes containing codons at position bin 0 (corresponding to the first 10% of the coding sequence). For each strain and each COG category, the number of genes with TTA or AGA codons located specifically in bin 0 was counted and compared to the total number of codon-containing genes across all bins. The test evaluated whether the presence of TTA or AGA codons in bin 0 was significantly overrepresented within a given COG category, compared to a background of all codon-containing genes. The same statistical formulation was used as in the overall enrichment analysis, with the adjusted counts reflecting positional restriction to bin 0. This analysis was performed for both TTA and AGA, thereby distinguishing functional categories where rare codons are not only enriched in general but also preferentially positioned at the start of genes. An analogous enrichment analysis was performed for BGCs to test whether the presence or rare codons in non-randomly associated with secondary metabolism. For each strain, antiSMASH 8.0 predictions were used to identify BGC genes, and the hypergeometric test was applied to determine whether genes containing TTA or AGA codons are statistically overrepresented within BGCs compared to the rest of the genome [ 44 ]. Here, K corresponded to the total number of genes assigned to BGCs, and k to the number of rare-codon genes observed in BGCs. To assess enrichment of rare codons within functional gene categories inside BGCs, we applied Fisher’s exact test. For each strain and each antiSMASH functional category (biosynthetic, biosynthetic-additional, regulatory, resistance, transport, unknown, and others), a 2 × 2 contingency table was constructed to compare codon-containing and codon-lacking genes inside the category versus all other BGC categories (Table 2 ): Table 2 Contingency table for Fisher’s test. Codon + Codon - Genes in category a b Genes in other BGC categories c d where: a is the number of genes in the category that contain at least one TTA or AGA codon, b is the number of genes in the category that do not contain the codon, c is the number of codon-containing genes in all other BGC categories, d is the number of genes without the codon in all other BGC categories. The test evaluates whether the proportion of TTA or AGA codon-containing genes in a given functional category is significantly greater than expected relative to all other BGC genes. Results were summarized at the genus level as the percentage of strains showing significant enrichment ( p 1) for TTA and AGA separately. All enrichment calculations were performed using the hypergeom.sf and fisher exact functions from the SciPy statistical package in Python [ 49 ]. Declarations Ethics approval and consent to participate Not applicable. Consent for publication Not applicable. Competing interests Authors declare no competing interests. Funding This work was funded by the Novo Nordisk Foundation through the Center for Biosustainability at the Technical University of Denmark (NNF Grant Number: NNF20CC0035580). T.J.B. and S.E.W. acknowledge support from Novo Nordisk Foundation Postdoctoral Fellowships (NNF Grant Numbers: NNF22OC0078997, NNF22OC0079021). H.U.K. acknowledges support from the National Research Foundation funded by the Korean government (RS-2024-00352229). Author Contribution P.C. and A.R. initiated the study. A.R. performed computational analyses and prepared the figures. T.J.B. and P.C. provided analytical guidance and contributed to data interpretation. S.E.W, K.B., H.U.K. and T.W. contributed to biological interpretation and manuscript revision. All authors approved the final manuscript. Acknowledgement The authors thank Simon Shaw for assistance with antiSMASH analyses and colleagues from DTU Biosustain, Natural Product Genome Mining group for helpful discussions. Data Availability All data is available as supplementary information in Additional Files 1 to 3. Table S1 contains the accessions of all genomes used in the study. The analysis for this manuscript is available at GitHub link https://github.com/arudenko-2025/rare-codons-actinomycetota [50] and the version of code is also deposited at Zenodo https:/doi.org/10.5281/zenodo.17285473 [51]. References Chater KF, Biro S, Lee KJ, Palmer T, Schrempf H. The complex extracellular biology of Streptomyces. FEMS Microbiology Reviews. 2010 3;34(2):171–98. https://doi.org/10.1111/j.1574-6976.2009.00206.x Chater KF, Chandra G. The use of the rare UUA codon to define expression space for genes involved in secondary metabolism, development and environmental adaptation in Streptomyces. Journal of Microbiology. 2008 2;46(1):1–11. https://doi.org/10.1007/s12275-007-0233-1 Liu H, Li J, Singh BK. Harnessing co-evolutionary interactions between plants and Streptomyces to combat drought stress. Nature Plants 2024 10:8. 2024 7;10(8):1159–1171. https://doi.org/10.1038/s41477-024-01749-1 Goodfellow M, Williams ST. Ecology of actinomycetes. Annu Rev Microbiol. 1983;37:189–216. https://doi.org/10.1146/annurev.mi.37.100183.001201 . Stach JEM, Bull AT. Estimating and comparing the diversity of marine actinobacteria. Antonie van Leeuwenhoek, International Journal of General and Molecular Microbiology. 2005 1;87(1):3–9. https://doi.org/10.1007/S10482-004-6524-1 Jørgensen TS, Mohite OS, Sterndorff EB, Alvarez-Arevalo M, Blin K, Booth TJ, et al. A treasure trove of 1034 actinomycete genomes. Nucleic Acids Res. 2024;7(13):7487–503. https://doi.org/10.1093/NAR/GKAE523 . Baltz RH. Gifted microbes for genome mining and natural product discovery. Journal of Industrial Microbiology and Biotechnology. 2017 5;44(4–5):573–588. https://doi.org/10.1007/s10295-016-1815-x n der Meij A, Worsley SF, Hutchings MI, van Wezel GP. Chemical ecology of antibiotic production by actinomycetes. FEMS Microbiology Reviews. 2017 5;41(3):392–416. https://doi.org/10.1093/femsre/fux005 Onaka H. Novel antibiotic screening methods to awaken silent or cryptic secondary metabolic pathways in actinomycetes. The Journal of antibiotics. 2017 7;70(8):865–870. https://doi.org/10.1038/JA.2017.51 Plotkin JB, Kudla G. Synonymous but not the same: The causes and consequences of codon bias. Nature Reviews Genetics. 2011 1;12(1):32–42. https://doi.org/10.1038/nrg2899 Quax TEF, Claassens NJ, van der Soll D. Codon Bias as a Means to Fine-Tune Gene Expression. Mol Cell. 2015;7(2):149–61. https://doi.org/10.1016/J.MOLCEL.2015.05.035 . Sharp PM, Li WH. The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Research. 1987 2;15(3):1281–1295. https://doi.org/10.1093/NAR/15.3.1281 Hershberg R, Petrov DA. Selection on codon bias. Annu Rev Genet. 2008;287–99. https://doi.org/10.1146/annurev.genet. 42.110807.091442 . 12;42(42, 2008. Leskiw BK, Bibb MJ, Chater KF. The use of a rare codon specifically during development? Molecular Microbiology. 1991;5(12):2861–2867. https://doi.org/10.1111/j.1365-2958 . 1991.tb01845.x. Chater KF. Streptomyces inside-out: a new perspective on the bacteria that provide us with antibiotics. Philosophical Trans Royal Soc B: Biol Sci. 2006;361(1469):761. https://doi.org/10.1098/RSTB.2005.1758 . Silov S, Zaburannyi N, Anisimova M, Ostash B. The Use of the Rare TTA Codon in Streptomyces Genes: Significance of the Codon Context? Indian Journal of Microbiology. 2020 3;61(1):24. https://doi.org/10.1007/S12088-020-00902-6 Nguyen KT, Tenor J, Stettler H, Nguyen LT, Nguyen LD, Thompson CJ. Colonial Differentiation in Streptomyces coelicolor Depends on Translation of a Specific Codon within the adpA Gene. Journal of Bacteriology. 2003 12;185(24):72917296. https://doi.org/10.1128/JB.185.24.7291-7296.2003 Ohnishi Y, Yamazaki H, Kato JY, Tomono A, Horinouchi S. AdpA, a Central Transcriptional Regulator in the A-Factor Regulatory Cascade That Leads to Morphological Development and Secondary Metabolism in Streptomyces griseus. Bioscience, Biotechnology, and Biochemistry. 2005 1;69(3):431–439. https://doi.org/10.1271/BBB.69.431 White J, Bibb M. bldA Dependence of Undecylprodigiosin Production in Streptomyces coelicolor A3(2) Involves a Pathway-Specific Regulatory Cascade. J Bacteriol. 1997;179(3):627–33. https://doi.org/10.1128/jb.179.3.627-633.1997 . Rebets YV, Ostash BO, Fukuhara M, Nakamura T, Fedorenko VO. Expression of the regulatory protein LndI for landomycin E production in Streptomyces globisporus 1912 is controlled by the availability of tRNA for the rare UUA codon. FEMS Microbiology Letters. 2006 3;256(1):30–7. https://doi.org/10.1111/J. 1574-6968.2005.00087.X Koshla O, Lopatniuk M, Rokytskyy I, Yushchuk O, Dacyuk Y, Fedorenko V, et al. Properties of Streptomyces albus J1074 mutant deficient in \:{\text{t}\text{R}\text{N}\text{A}}_{\text{U}\text{A}\text{A}}^{\text{L}\text{e}\text{u}} gene bldA. Arch Microbiol. 2017;10(8):1175–83. https://doi.org/10.1007/s00203-017-1389-7 . Lee N, Kim W, Kim JH, Lee Y, Hwang S, Kim G et al. Regulatory orchestration of FK506 biosynthesis in Streptomyces tsukubaensis NRRL 18488 revealed through systematic analysis. iScience. 2025 5;p. 112698. https://doi.org/10.1016/J.ISCI.2025.112698 Kudla G, Murray AW, Tollervey D, Plotkin JB. Coding-sequence determinants of expression in Escherichia coli. Science. 2009;4(5924):255–8. https://doi.org/10.1126/science.1170160 . Tuller T, Waldman YY, Kupiec M, Ruppin E. Translation efficiency is determined by both codon bias and folding energy. Proc Natl Acad Sci USA. 2010;2(8):3645–50. https://doi.org/10.1073/pnas.0909910107 . Sauert M, Temmel H, Moll I. Heterogeneity of the translational machinery: Variations on a common theme. Biochimie. 2015;6:114:39–47. https://doi.org/10.1016/j.biochi.2014.12.011 . Gingold H, Pilpel Y. Determinants of translation efficiency and accuracy. Mol Syst Biol. 2011;7. https://doi.org/10.1038/msb.2011.14 . Hershberg R, Petrov DA. General rules for optimal codon choice. PLoS Genet. 2009;7(7). https://doi.org/10.1371/journal.pgen.1000556 . Pothier JF, Bolt V, Arn F, Frasson D, Rhyner N, Sievers M. High-Quality Draft Genome Sequence of Streptomyces albidoflavus CCOS 2040, Isolated from a Swiss Soil Sample. Microbiology Resource Announcements. 2023 3;12(3):01225–22. https://doi.org/10.1128/MRA.01225-22 Zaburannyi N, Rabyk M, Ostash B, Fedorenko V, Luzhetskyy A. Insights into naturally minimised Streptomyces albus J1074 genome. BMC Genomics. 2014;2(1):1–11. https://doi.org/10.1186/1471-2164-15-97/FIGURES/6 . Baltz RH. Streptomyces and Saccharopolyspora hosts for heterologous expression of secondary metabolite gene clusters. J Ind Microbiol Biotechnol. 2010;8(8):759–72. https://doi.org/10.1007/s10295-010-0730-9 . Bilyk O, Sekurova ON, Zotchev SB, Luzhetskyy A. Cloning and heterologous expression of the grecocycline biosynthetic gene cluster. PLoS ONE. 2016;7(7):e0158682. https://doi.org/10.1371/journal.pone.0158682 . Myronovskyi M, Rosenkranzer B, Nadmid S, Pujic P, Normand P, Luzhetskyy A. Generation of a cluster-free Streptomyces albus chassis strains for improved heterologous expression of secondary metabolite clusters. Metabolic Eng 2018 9;49:316–24. https://doi.org/10.1016/J.YMBEN.2018.09.004 Booth TJ, Shaw S, Cruz-Morales P, Weber T. getphylo: rapid and automatic generation of multi-locus phylogenetic trees. BMC Bioinformatics. 2025 1;26(1):21. https://doi.org/10.1186/s12859-025-06035-1 Nouioui I, Carro L, Garcia-Lopez M, Meier-Kolthoff JP, Woyke T, Kyrpides NC et al. Genome-based taxonomic classification of the phylum actinobacteria. Frontiers in Microbiology. 2018 8;9(AUG):355158. https://doi.org/10.3389/FMICB . 2018.02007/XML. Roy M, Martial A, Ahmad S. Disseminated Nocardia beijingensis Infection in an Immunocompetent Patient. Eur J Case Rep Intern Med 2020 9;7(11). https://doi.org/10.12890/2020_001904 Larkin MJ, De Mot R, Kulakov LA, Nagy I. Applied aspects of Rhodococcus genetics. Antonie van Leeuwenhoek. Int J Gen Mol Microbiol. 1998;74(1–3):133–53. https://doi.org/10.1023/a:1001776500413 . Lal D, Verma M, Behura SK, Lal R. Codon usage bias in phylum Actinobacteria: relevance to environmental adaptation and host pathogenicity. Res Microbiol. 2016;10(8):669–77. https://doi.org/10.1016/j.resmic.2016.06.003 . Hackl S, Bechthold A. The Gene bldA, a Regulator of Morphological Differentiation and Antibiotic Production in Streptomyces. Arch Pharm. 2015;7(7):455–62. https://doi.org/10.1002/ARDP.201500073 . Clarke TF, Clark PL. Increased incidence of rare codon clusters at 5’ and 3’ gene termini: implications for function. BMC Genomics. 2010;118(11). https://doi.org/10.1186/1471-2164-11-118 . Cantalapiedra CP, Hernandez-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol Biol Evol. 2021;38(12):5825–9. https://doi.org/10.1093/molbev/msab293 . Huerta-Cepas J, Szklarczyk D, Heller D, Hernandez-Plaza A, Forslund SK, Cook H, et al. EggNOG 5.0: A hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;147(D1):D309–14. https://doi.org/10.1093/nar/gky1085 . Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28(1):33–6. https://doi.org/10.1093/nar/28.1.33 . Nikolaidis M, Hesketh A, Frangou N, Mossialos D, Van de Peer Y, Oliver SG, et al. A panoramic view of the genomic landscape of the genus Streptomyces. Microb Genomics. 2023;9(6). https://doi.org/10.1099/mgen.0.001028 . Blin K, Shaw S, Vader L, Szenei J, Reitz Z, Augustijn H, et al. antiSMASH 8.0: extended gene cluster detection capabilities and analyses of chemistry, enzymology, and regulation. Nucleic Acids Res. 2025;4. https://doi.org/10.1093/nar/gkaf334 . Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;7(7):1043–55. https://doi.org/10.1101/gr.186072.114 . Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, et al. Mash: Fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;6(1). https://doi.org/10.1186/s13059-016-0997-x . Nguyen LV, Laval JP, Chainais P, Iop A, Blondel VD, Guillaume JL et al. Fast unfolding of communities in large networks. J Stat Mechanics: Theory Exp 2008 10;2008(10):P10008. https://doi.org/10.48550/arXiv.0803.0476 Letunic I, Bork P. Interactive tree of life (iTOL) v5: An online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;7(W1):W293–6. https://doi.org/10.1093/nar/gkab301 . Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods. 2020 3;17(3):261–272. https://doi.org/10.1038/s41592-019-0686-2 Rudenko A. Code and analysis for Unraveling Rare Codon Bias in Actinomycetota: Lineage-Specific and 5’ Terminal Enrichment Across 1936 Genomes. GitHub 2025. Available from: https://github.com/arudenko-2025/rare-codons-actinomycetota Rudenko A. arudenko-2025/rare-codons-actinomycetota: v0.1-preprint. Zenodo. 2025. Available from: https://doi.org/10.5281/zenodo.17285473 Additional Declarations No competing interests reported. Supplementary Files AdditionalFile3.xlsx AdditionalFile2.docx AdditionalFile1.xlsx Cite Share Download PDF Status: Published Journal Publication published 05 Mar, 2026 Read the published version in BMC Genomics → Version 1 posted Editorial decision: Revision requested 30 Dec, 2025 Reviews received at journal 09 Dec, 2025 Reviews received at journal 17 Nov, 2025 Reviewers agreed at journal 10 Nov, 2025 Reviewers agreed at journal 10 Nov, 2025 Reviewers agreed at journal 10 Nov, 2025 Reviewers invited by journal 09 Nov, 2025 Editor invited by journal 04 Nov, 2025 Editor assigned by journal 02 Nov, 2025 Submission checks completed at journal 02 Nov, 2025 First submitted to journal 31 Oct, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7996976","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":546308472,"identity":"b0526f9e-7ca6-4f0c-ab52-87c32d90508e","order_by":0,"name":"Anna Rudenko","email":"","orcid":"","institution":"Technical University of Denmark","correspondingAuthor":false,"prefix":"","firstName":"Anna","middleName":"","lastName":"Rudenko","suffix":""},{"id":546308473,"identity":"07685e08-e6f3-4bf7-b0e0-c147df06b397","order_by":1,"name":"Thomas J. Booth","email":"","orcid":"","institution":"Technical University of Denmark","correspondingAuthor":false,"prefix":"","firstName":"Thomas","middleName":"J.","lastName":"Booth","suffix":""},{"id":546308474,"identity":"437a0ce9-c0ba-42e8-87de-65e6e360b28a","order_by":2,"name":"Sam E. Williams","email":"","orcid":"","institution":"Technical University of Denmark","correspondingAuthor":false,"prefix":"","firstName":"Sam","middleName":"E.","lastName":"Williams","suffix":""},{"id":546308475,"identity":"0823d171-2514-4687-b9a3-bc896c3d4bbb","order_by":3,"name":"Kai Blin","email":"","orcid":"","institution":"Technical University of Denmark","correspondingAuthor":false,"prefix":"","firstName":"Kai","middleName":"","lastName":"Blin","suffix":""},{"id":546308476,"identity":"2621a51a-317b-404c-8573-421400992f08","order_by":4,"name":"Hyun Uk Kim","email":"","orcid":"","institution":"Korea Advanced Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Hyun","middleName":"Uk","lastName":"Kim","suffix":""},{"id":546308477,"identity":"41ba3bbd-ae60-42e7-bddf-de83ec871fd4","order_by":5,"name":"Tilmann Weber","email":"","orcid":"","institution":"Technical University of Denmark","correspondingAuthor":false,"prefix":"","firstName":"Tilmann","middleName":"","lastName":"Weber","suffix":""},{"id":546308478,"identity":"7778d68d-2cbc-4109-a734-cd61288c5b97","order_by":6,"name":"Pep Charusanti","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA+0lEQVRIiWNgGAWjYHACNgaGigM8UI4EECeABBMIaDkD1XIArCWZCC2MbQcYoFoYCGsxZ29+9uDnvDsy/PxnjD9/qLHIY2DPPybBUJaGU4tlzzFzw95tz3gkZ+SYSRw4JlHMwPOYTYLhXA5OLQY3ctgkeLcd5jG4wWPGcIBNIrFBIplNgrGtAreW+2/YJP/OAWo5f8b4w4F/xGi5wcMmzdsA1HIgx0DiYBtcC26HWfakmUnLHAP5Ja1M4myfRGIbz2Nji4RzuL1vzn74meSbmjv2/PyHN3+o+FaX2M+e+PDGh7Jk3A7DEGEDEQk4NWDTMgpGwSgYBaMAHQAAOOVR5sCkgykAAAAASUVORK5CYII=","orcid":"","institution":"Technical University of Denmark","correspondingAuthor":true,"prefix":"","firstName":"Pep","middleName":"","lastName":"Charusanti","suffix":""}],"badges":[],"createdAt":"2025-10-31 09:38:44","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7996976/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7996976/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12864-026-12708-9","type":"published","date":"2026-03-05T15:57:06+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":96318444,"identity":"4e678b0f-ca0d-4cfb-b515-aecc80c83a4b","added_by":"auto","created_at":"2025-11-19 18:15:40","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4065102,"visible":true,"origin":"","legend":"","description":"","filename":"RareCodonManuscriptFinal.docx","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/189185705fcb0ef117377a04.docx"},{"id":96366089,"identity":"b5b790ef-be55-4bf8-83c1-139de3284815","added_by":"auto","created_at":"2025-11-20 10:11:10","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8757,"visible":true,"origin":"","legend":"","description":"","filename":"46bd9af952424087b1b6e0dab6cb5c25.json","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/006d542c4faa0a0a8235777b.json"},{"id":96318437,"identity":"869f10b2-abd8-4eca-ba74-997c6180512f","added_by":"auto","created_at":"2025-11-19 18:15:40","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":215043,"visible":true,"origin":"","legend":"","description":"","filename":"AdditionalFile1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/cbd2e0e83d523e0856a8f15d.xlsx"},{"id":96365461,"identity":"303582dc-9052-41b6-ab13-99e04db843d8","added_by":"auto","created_at":"2025-11-20 10:10:22","extension":"docx","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2149521,"visible":true,"origin":"","legend":"","description":"","filename":"AdditionalFile2.docx","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/7a0ccfe6e6b20796c8a01a56.docx"},{"id":96364965,"identity":"480b1e49-81ef-47da-b8d7-b96898148fd6","added_by":"auto","created_at":"2025-11-20 10:09:51","extension":"xlsx","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":91856,"visible":true,"origin":"","legend":"","description":"","filename":"AdditionalFile3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/6f39e86bed48e51aef49d379.xlsx"},{"id":96365175,"identity":"5b2b8c21-cee9-4d32-ba07-812567c59a64","added_by":"auto","created_at":"2025-11-20 10:10:04","extension":"xml","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":150766,"visible":true,"origin":"","legend":"","description":"","filename":"46bd9af952424087b1b6e0dab6cb5c251enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/4187c3eabe56c9dd46a49e67.xml"},{"id":96318454,"identity":"0c3f4cf5-f696-4571-9b78-4bb9e68e6451","added_by":"auto","created_at":"2025-11-19 18:15:40","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":316764,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/6df6658e3adbff17e898e3ec.png"},{"id":96318455,"identity":"cd67af3d-acf3-447f-96e4-801bd36c846c","added_by":"auto","created_at":"2025-11-19 18:15:40","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":372755,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/66337d454a9e0243350a4eb4.png"},{"id":96365184,"identity":"eba5fcb6-a6ab-494c-b0a8-178a48b502cb","added_by":"auto","created_at":"2025-11-20 10:10:04","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":133178,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/c09c7d0ab7455a317e1253d4.png"},{"id":96318447,"identity":"7c95d83e-361e-4222-8009-ccede6616006","added_by":"auto","created_at":"2025-11-19 18:15:40","extension":"png","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":137147,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/15d43f2519d95ba605ad3d31.png"},{"id":96366938,"identity":"814b8066-3448-4bd5-a7fe-1019fd8a3678","added_by":"auto","created_at":"2025-11-20 10:12:03","extension":"png","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":67864,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/e57dd987d5b794e926e6a38f.png"},{"id":96365218,"identity":"7107ca75-7206-470a-aed4-6d3b2774acb4","added_by":"auto","created_at":"2025-11-20 10:10:06","extension":"png","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":49702,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/3f7cc4e22861701809c0c82a.png"},{"id":96318453,"identity":"0763cdc4-a1ff-44cf-8302-115986111680","added_by":"auto","created_at":"2025-11-19 18:15:40","extension":"png","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":92357,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/ee22ff3747aac265dd58accc.png"},{"id":96366042,"identity":"b4cb5b7d-e2e1-4033-9a89-5063bb494929","added_by":"auto","created_at":"2025-11-20 10:11:05","extension":"xml","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":151087,"visible":true,"origin":"","legend":"","description":"","filename":"46bd9af952424087b1b6e0dab6cb5c251structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/b524468487e0891faf088739.xml"},{"id":96318457,"identity":"70d82102-c4bb-4c4a-b720-d219b10f5870","added_by":"auto","created_at":"2025-11-19 18:15:40","extension":"html","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":168740,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/be04dc8be977c29862d81933.html"},{"id":96318433,"identity":"a5cd2ab5-d394-470a-ad3f-74a7b7599e68","added_by":"auto","created_at":"2025-11-19 18:15:40","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":664762,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA)\u003c/strong\u003e Heatmap of codon usage across Actinomycetotal genomes. Rows represent individual genomes (colored by genus) and columns represent normalized codon usage. Dendrograms indicate hierarchical clustering of genomes and codons. Strains within the red box show elevated GCC codon usage. \u003cstrong\u003eB)\u003c/strong\u003e Top 15 strains with the highest median GCC codon percentage. Boxes indicate the interquartile range (IQR), medians are shown by horizontal lines, whiskers extend to 1.5×IQR, and outliers are displayed as individual points. ’K.’ denotes \u003cem\u003eKitasatospora\u003c/em\u003e and’S.’ denotes \u003cem\u003eStreptomyces\u003c/em\u003e. \u003cstrong\u003eC)\u003c/strong\u003e Relationship between genome size and GCC codon usage across Actinomycetota genomes. Each point represents a genome colored by genus.\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/d173402f30669e66571ebf98.jpeg"},{"id":96366098,"identity":"e8d3660a-2b9e-437c-bf89-65bc62d6a8bc","added_by":"auto","created_at":"2025-11-20 10:11:11","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":802711,"visible":true,"origin":"","legend":"\u003cp\u003ePhylogenetic distribution and frequency of the TTA codon-containing genes across the 1936 Actinomycetota genomes. Bar heights represent the percentage of genes containing the TTA codon. A black star indicates strains in which a codon other than TTA is the rarest. Genera are color-coded along the outer ring. The branch length is ignored.\u003c/p\u003e","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/8aefcdefefdcb2e16253a7eb.jpeg"},{"id":96365531,"identity":"42b1314c-980f-4610-984d-e9321647a563","added_by":"auto","created_at":"2025-11-20 10:10:29","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":607958,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA) \u003c/strong\u003ePercentage of genes per strain containing at least one TTA codon. Boxes indicate the interquartile range (IQR), medians are shown by horizontal lines, whiskers extend to 1.5×IQR, and outliers are displayed as individual points. Each genus is labeled with its sample size (n). \u003cstrong\u003eB)\u003c/strong\u003eRelationship between the percentage of genes containing TTA codons and the total TTA codon usage per genome. \u003cstrong\u003eC)\u003c/strong\u003e Relationship between genomic GC content and the percentage of genes with TTA codons. In both \u003cstrong\u003eB)\u003c/strong\u003e and \u003cstrong\u003eC)\u003c/strong\u003ePearson correlation coefficient and significance is shown. Each point represents one genome and genera are color-coded.\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/346edea86bd341b71128c6f2.jpeg"},{"id":96366897,"identity":"50a1de66-ad47-4670-832d-295fbdd46822","added_by":"auto","created_at":"2025-11-20 10:12:01","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":924959,"visible":true,"origin":"","legend":"\u003cp\u003eHeatmaps illustrating the normalized codon frequency distribution across positional bins (0–9). Each panel corresponds to a different genus. Most common codons are at the top and rarest codons at the bottom.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/82846fbc7dc2365d87598b88.png"},{"id":96318446,"identity":"f189de42-109a-4e12-adb2-353226812fce","added_by":"auto","created_at":"2025-11-19 18:15:40","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":360914,"visible":true,"origin":"","legend":"\u003cp\u003ePearson and Spearman correlation between normalized codon rarity and bin 0 (gene start) enrichment across genera. Each panel corresponds to a different genus. The x-axis indicates normalized rarity (0–1, with 1 being rarest); the y-axis represents the proportion of each codon found in bin 0. Dashed lines indicate linear regression fits based on Pearson correlation. The five most frequently and five least frequently used codons are labelled.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/4fb7679f5f4c4d59c2ed22a6.png"},{"id":96365742,"identity":"aef1ed3b-7a51-4aac-b219-51fbe052df85","added_by":"auto","created_at":"2025-11-20 10:10:44","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":210197,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA) \u003c/strong\u003eTTA codon frequency across 10 gene position bins. \u003cstrong\u003eB) \u003c/strong\u003eAGA codon frequency across 10 gene position bins. \u003cstrong\u003eC) \u003c/strong\u003eTTA codon enrichment in the first and last 10 amino acids of genes. \u003cstrong\u003eD) \u003c/strong\u003eAGA codon enrichment in the first and last 10 amino acids of genes. Genera are clustered by similarity in all panels.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/6deb47b27e19a5d9f3a209f3.png"},{"id":96318435,"identity":"c2085b17-6ac9-41b3-a01d-08924ba50a45","added_by":"auto","created_at":"2025-11-19 18:15:40","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":543278,"visible":true,"origin":"","legend":"\u003cp\u003eEnrichment of TTA and AGA codons within biosynthetic gene clusters (BGCs). \u003cstrong\u003eA)\u003c/strong\u003e Percentage of genomes within each genus showing significant enrichment of TTA (dark bars) or AGA (light bars) codons in BGCs compared to the genomic background. \u003cstrong\u003eB)\u003c/strong\u003e Distribution of TTA codon enrichment across functional BGC gene categories. \u003cstrong\u003eC)\u003c/strong\u003e Distribution of AGA codon enrichment across functional BGC gene categories. Heatmaps in panels \u003cstrong\u003eB)\u003c/strong\u003e and \u003cstrong\u003eC)\u003c/strong\u003e show the proportion of strains per genus with significant enrichment.\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/325d2ec55526e9bac316719c.png"},{"id":104250604,"identity":"b936cc64-6b0a-4253-ad47-e7d66054d396","added_by":"auto","created_at":"2026-03-09 16:01:51","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5156117,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/9a281394-b096-4405-a4a7-e428fbe06cd9.pdf"},{"id":96365587,"identity":"88d9b3fd-8a23-407b-838b-af6e1322dc9a","added_by":"auto","created_at":"2025-11-20 10:10:33","extension":"xlsx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":91856,"visible":true,"origin":"","legend":"","description":"","filename":"AdditionalFile3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/e6a4d5f14d6ef519fd91a674.xlsx"},{"id":96318439,"identity":"4e219bd1-05c1-4dab-8d31-548b55268ab0","added_by":"auto","created_at":"2025-11-19 18:15:40","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":2149521,"visible":true,"origin":"","legend":"","description":"","filename":"AdditionalFile2.docx","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/b4ffc89c67293c6ae394b575.docx"},{"id":96318451,"identity":"dad756b4-d2fc-48a7-a944-762e34e69545","added_by":"auto","created_at":"2025-11-19 18:15:40","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":215043,"visible":true,"origin":"","legend":"","description":"","filename":"AdditionalFile1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7996976/v1/c86d17b99b106addbc4ecbed.xlsx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Unraveling Rare Codon Bias in Actinomycetota: Lineage-Specific and 5’ Terminal Enrichment Across 1936 Genomes","fulltext":[{"header":"Background","content":"\u003cp\u003eActinomycetota, formerly referred to as Actinobacteria, is a large phylum of gram-positive bacteria that are widely distributed in the environment and play key roles in soil ecosystems [\u003cspan additionalcitationids=\"CR2 CR3 CR4\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Many Actinomycetota have large genomes, often ranging from 8 to 10 Mb, encoding remarkable metabolic versatility and complex regulatory networks [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Within Actinomycetota, the genus \u003cem\u003eStreptomyces\u003c/em\u003e is renowned for its ability to produce a wide range of bioactive secondary/specialized metabolites, including antibiotics, antifungals, immunosuppressants and antitumor agents [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. These natural products have a profound impact on medicine and agriculture, yet the biosynthetic potential of many Actinomycetota remains untapped. A significant obstacle in harnessing their full potential lies in the poor expression of biosynthetic gene clusters (BGCs) responsible for the production of secondary metabolites under laboratory conditions [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. A deeper understanding of the regulatory and genetic mechanisms controlling secondary metabolism would therefore benefit efforts to discover new secondary metabolites from Actinomycetota.\u003c/p\u003e\u003cp\u003eCodon usage bias (CUB) refers to the preferential use of certain synonymous codons over others. Codon usage significantly impacts gene expression, protein folding, and fitness by modulating ribosome speed and tRNA availability [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Codons that match abundant tRNAs are often associated with efficient translation and high expression of essential genes [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e], while rare codons can introduce translational bottlenecks or act as regulatory elements. CUB is shaped by mutational bias, translational selection, and regulatory constraints [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], and serves as an active regulatory layer, particularly when specific codons depend on conditionally expressed tRNAs [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eIn \u003cem\u003eStreptomyces\u003c/em\u003e, the impact of codon usage bias is exemplified by the rare leucine codon TTA. TTA is the rarest codon in \u003cem\u003eStreptomyces\u003c/em\u003e genomes [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], occurring in only 2\u0026ndash;3% of genes [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], and plays a key role in regulating gene expression. The scarcity of this codon reflects its exclusive dependence on a single specialized tRNA, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{t}\\text{R}\\text{N}\\text{A}}_{\\text{U}\\text{A}\\text{A}}^{\\text{L}\\text{e}\\text{u}}\\)\u003c/span\u003e\u003c/span\u003e, encoded by the \u003cem\u003ebldA\u003c/em\u003e gene [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. First characterized in \u003cem\u003eStreptomyces coelicolor\u003c/em\u003e, \u003cem\u003ebldA\u003c/em\u003e is not constitutively expressed but is induced under specific developmental or environmental conditions, typically during later stages of the life cycle [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. One example of a gene regulated by TTA usage is \u003cem\u003eadpA\u003c/em\u003e, which is a global transcriptional regulator essential for development and the onset of secondary metabolism in \u003cem\u003eStreptomyces\u003c/em\u003e [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. Its translation depends on a single TTA codon, making it strictly reliant on \u003cem\u003ebldA\u003c/em\u003e. Once translated, AdpA activates genes of BGCs as well as those involved in morphological differentiation [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. In \u003cem\u003eStreptomyces albidoflavus\u003c/em\u003e J1074 (formerly \u003cem\u003eStreptomyces albus\u003c/em\u003e), a widely used host for heterologous expression, deletion of \u003cem\u003ebldA\u003c/em\u003e abolishes the production of both native and heterologously expressed antibiotics [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Similarly, in \u003cem\u003eStreptomyces tsukubaensis\u003c/em\u003e, severe ribosome pausing at a TTA codon within the FK506 BGC creates a translational bottleneck, underscoring the regulatory significance of rare codons during secondary metabolism [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eThe location of a given codon within a gene also affects translation. Rare codons at the start of coding sequences can reduce translation efficiency by slowing ribosome initiation and altering expression timing [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. These effects are especially relevant for codons such as TTA, which is known to delay translation when located near the start of transcripts [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. Codons located mid-gene may impact folding, while those at the end can influence ribosome release and final folding [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. The positioning of rare codons - particularly TTA - appears to contribute to the regulation of gene expression, enabling precise temporal control of developmental and stress-responsive genes in response to environmental conditions [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eWhile current knowledge about the TTA codon and codon usage bias is derived from a limited number of \u003cem\u003eStreptomyces\u003c/em\u003e genomes, the increasing availability of high-quality sequences across the phylum allows for broader investigation. In this study, we analyze 1936 Actinomycetota genomes across 11 genera to assess the distribution and positional enrichment of codons. In addition to TTA, we also examine the usage and positional biases of other rare codons and their potential roles in translation regulation. We explore whether the regulatory role of TTA, especially its association with secondary metabolism, is conserved across the phylum or represents a lineage specific adaptation in \u003cem\u003eStreptomyces\u003c/em\u003e. By analyzing codon usage patterns at scale, we provide new insights into the evolutionary and functional dynamics of codon bias in Actinomycetota and highlight potential strategies for manipulating gene expression in industrial and synthetic biology applications.\u003c/p\u003e"},{"header":"Results and Discussion","content":"\u003cp\u003e\u003cb\u003eCodon usage patterns across Actinomycetota highlight GCC enrichment in\u003c/b\u003e \u003cb\u003eStreptomyces albidoflavus\u003c/b\u003e\u003c/p\u003e\u003cp\u003eTo explore codon usage bias (CUB) across different genera of Actinomycetota, we calculated codon frequencies for all coding sequences from 1936 genomes as relative usage per strain. This dataset comprises 1448 high-quality genomes (NCBI complete/chromosome level) and 488 medium-quality genomes (CheckM completeness\u0026thinsp;\u0026gt;\u0026thinsp;90%, contamination\u0026thinsp;\u0026lt;\u0026thinsp;5%, \u0026lt; 50 contigs). The genomes used in this study and metadata are listed in Additional file 1, Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e. To visualize usage patterns, a heatmap was generated and ordered using hierarchical clustering (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA). The clustering captures distinct genus-specific CUBs with clear groupings observed for \u003cem\u003eMicromonospora\u003c/em\u003e, \u003cem\u003eKribbella\u003c/em\u003e and \u003cem\u003eNocardia\u003c/em\u003e. The tight clustering of these genera reflects highly conserved CUB within each group. In contrast, clustering for other genera such as \u003cem\u003eStreptomyces\u003c/em\u003e, \u003cem\u003eRhodococcus\u003c/em\u003e and \u003cem\u003eKitasatospora\u003c/em\u003e are more variable.\u003c/p\u003e\u003cp\u003eGC-rich codons such as GCC, CTG, GGC, GAC and GCG, which encode alanine, leucine, glycine, aspartate, and alanine, respectively, were the most frequently used across all strains. This pattern reflects the high GC content of most Actinomycetota and supports previous findings that GC content is a dominant force shaping codon usage bias in bacteria [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. In contrast, the rarest codons were TTA, CTA, ATA, AGA and TTT, which encode leucine, leucine, isoleucine, arginine, and phenylalanine, respectively. Among these, TTA was the least used among all genomes. As the only leucine codon lacking a G or C, TTA is particularly notable for its well-established regulatory role in Streptomyces, for which translation of TTA-containing genes depends on the developmentally controlled bldA tRNA [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e].\u003c/p\u003e\u003cp\u003e\u003cem\u003eStreptomyces albidoflavus\u003c/em\u003e and several strains of \u003cem\u003eKitasatospora\u003c/em\u003e showed a marked enrichment of the GCC codon (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA red square, Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB). \u003cem\u003eS. albidoflavus\u003c/em\u003e has a relatively compact genome (~\u0026thinsp;6.8\u0026ndash;7.2 Mb), smaller than the 8\u0026ndash;9 Mb genomes typical of\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003emost \u003cem\u003eStreptomyces\u003c/em\u003e [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. These two features, an elevated GCC percentage in a compact genome, appears to be a unique feature of \u003cem\u003eS. albidoflavus.\u003c/em\u003e In our dataset, there was no correlation between genome size and GCC usage (R\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;=\u0026thinsp;0.02, Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eC). The elevated GCC usage might reflect selective pressure to optimize translational efficiency by reducing reliance on rare codon-specific tRNAs, potentially contributing to the strain\u0026rsquo;s robustness and reliability as a chassis for heterologous expression [\u003cspan additionalcitationids=\"CR31\" citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e].\u003c/p\u003e\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003ePhylogenetic and genus-level distribution of TTA codon usage\u003c/h2\u003e\u003cp\u003eBecause of its established regulatory role in several \u003cem\u003eStreptomyces\u003c/em\u003e strains and to explore its potential evolutionary significance, we investigated the distribution of the TTA codon across the entire dataset. First, we quantified the frequency of the TTA codon across all genomes by calculating the percentage of genes containing at least one TTA codon for each strain. To place these patterns in an evolutionary context, we reconstructed a species tree using getphylo [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e] (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e) and rooted it at the most recent common ancestor (MRCA) of the genus \u003cem\u003eKribbella\u003c/em\u003e based on prior phylogenomic studies [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. We then assessed the emergence and distribution of TTA codon enrichment as a potential evolutionary development across lineages.\u003c/p\u003e\u003cp\u003eAcross all genomes, the percentage of genes containing the TTA codon ranged from as low as 0.5% to as high as 30.1%, highlighting substantial variation in the prevalence of this rare codon (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA). The \u003cem\u003eNocardiaceae\u003c/em\u003e genera \u003cem\u003eNocardia\u003c/em\u003e and \u003cem\u003eRhodococcus\u003c/em\u003e exhibited both the highest median percentage of TTA-containing genes and the greatest variability across strains, with medians of 8.9% and 8.4% and ranges of 2.7%\u0026ndash;30.1% and 1.8%\u0026ndash;22.0%, respectively. This combination of elevated medians and broad distributions suggests a complex evolutionary dynamic. Phylogenetic analysis indicates that enrichment of TTA-containing genes likely arose independently in multiple lineages, including within the \u003cem\u003eNocardiaceae\u003c/em\u003e family. Other genera may share a similar evolutionary trajectory, although current small sample sizes for some genera may be insufficient to fully resolve these patterns.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThis observation supports the idea that elevated TTA codon usage may reflect adaptive responses to lineage-specific ecological and regulatory demands. In \u003cem\u003eStreptomyces\u003c/em\u003e, the rare TTA codon is decoded by the \u003cem\u003ebldA\u003c/em\u003e-encoded tRNA, which plays a key role in controlling developmental processes such as sporulation and the activation of secondary metabolism [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. In contrast, \u003cem\u003eNocardia\u003c/em\u003e and \u003cem\u003eRhodococcus\u003c/em\u003e are non-sporulating genera [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e], and thus lack this developmental context for TTA regulation. In these taxa, high TTA usage is likely to be associated with alternative regulatory roles. Although the exact functions remain unknown, they may include stress response regulation, metabolic versatility, or control of horizontally acquired genes. Such lineage-specific differences in codon function are consistent with previous observations that codon usage patterns in Actinomycetota can be shaped by ecological and evolutionary pressures [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eIn contrast, a much lower percentage of genes in \u003cem\u003eStreptomyces\u003c/em\u003e contain the TTA codon, ranging from 0.8%\u0026ndash;6% with a median of 2.4%, indicating a more constrained pattern across strains. \u003cem\u003eKitasatospora\u003c/em\u003e (0.5%\u0026ndash;2.5%, median 1.5%) and \u003cem\u003ePseudonocardia\u003c/em\u003e (0.8%\u0026ndash;1.6%, median 1.2%) showed the lowest median values with minimal variability. The narrower distributions observed in these genera suggest that TTA codon usage is under stronger stabilizing constraints or has remained relatively unchanged over extended evolutionary periods, potentially reflecting conserved regulatory strategies or reduced selective pressures for TTA codon expansion.\u003c/p\u003e\u003cp\u003eThe percentage of genes containing TTA in \u003cem\u003eActinokineospora\u003c/em\u003e, \u003cem\u003eKribella, Saccharopolyspora\u003c/em\u003e, \u003cem\u003eAmycolatopsis\u003c/em\u003e, \u003cem\u003eSaccharothrix\u003c/em\u003e, and \u003cem\u003eMicromonospora\u003c/em\u003e fell in the middle. For example, in \u003cem\u003eActinokineospora\u003c/em\u003e, 1.8%\u0026ndash;11.4% of genes contained at least one TTA codon with a relatively high median of 8.3%, whereas in \u003cem\u003eKribbella\u003c/em\u003e, the range was 2.5%\u0026ndash;13.1% with a median of 5.7%. Although \u003cem\u003eKribbella\u003c/em\u003e was used as the root of the phylogenetic tree, its moderate and variable TTA percentage suggests that TTA codon usage underwent repeated, lineage-specific gains and losses in multiple clades since the divergence of \u003cem\u003eKribbella\u003c/em\u003e. Despite these differences among genera, TTA remained the rarest codon in nearly all strains analyzed with only three exceptions: one \u003cem\u003eKribbella\u003c/em\u003e strain where ATA (isoleucine) was rarer, one \u003cem\u003eNocardia\u003c/em\u003e strain where both ATA and AGA (arginine) were rarer, and one \u003cem\u003eSaccharopolyspora\u003c/em\u003e strain where ATA was rarer (indicated by stars in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eNext, we examined the relationship between the proportion of genes that contain the rare TTA codon and the overall TTA codon usage per genome (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB). Across all genomes, this relationship was highly linear (\u003cem\u003er\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.99), reflecting that TTA codon usage is broadly distributed among the genes in a genome rather than concentrated in outliers. When analyzed at the genus level (Additional file 2: Fig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e), this pattern remained consistent in most taxa with Pearson correlation coefficients ranging from \u003cem\u003er\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.96 to \u003cem\u003er\u003c/em\u003e\u0026thinsp;=\u0026thinsp;1.00. A notable exception was \u003cem\u003eStreptomyces\u003c/em\u003e, which exhibited a weaker correlation (\u003cem\u003er\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.74), indicating greater heterogeneity in the number of TTA codons per gene. This deviation was driven by a single outlier genome, \u003cem\u003eStreptomyces sp. f51\u003c/em\u003e, which carried a hypothetic plasmid sequence consisting of low complexity ORFs containing hundreds of TTA codons. These sequences artificially inflated rare-codon counts relative to the proportion of TTA-positive genes. Removing this genome restored the correlation to \u003cem\u003er\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.99 (Additional file 2: Fig. \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e), consistent with the other genera.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eTo test whether overall base composition shapes rare codon usage, we examined how genome-wide GC content correlates with the proportion of genes containing at least one TTA codon (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC). Across all genomes, a strong inverse correlation was observed (\u003cem\u003er\u003c/em\u003e\u0026thinsp;=\u0026thinsp;\u0026minus;\u0026thinsp;0.85), indicating that genomes with lower GC content tend to harbor a higher fraction of genes containing the AT-rich TTA codon. Genus-level analysis (Additional file 2: Fig. \u003cspan refid=\"MOESM3\" class=\"InternalRef\"\u003eS3\u003c/span\u003e) showed that this pattern is largely conserved across the genera, with individual genera exhibiting correlation values ranging from \u003cem\u003er\u003c/em\u003e\u0026thinsp;=\u0026thinsp;\u0026minus;\u0026thinsp;0.90 in \u003cem\u003eMicromonospora\u003c/em\u003e to \u003cem\u003er\u003c/em\u003e\u0026thinsp;=\u0026thinsp;\u0026minus;\u0026thinsp;0.36 in \u003cem\u003eStreptomyces\u003c/em\u003e. The relatively weak correlation in \u003cem\u003eStreptomyces\u003c/em\u003e, despite its characteristically high GC content, suggests that TTA codon presence in this genus may also be maintained by conserved regulatory functions, particularly those tied to \u003cem\u003ebldA\u003c/em\u003e-mediated control of key developmental and biosynthetic genes, rather than being solely explained by genomic composition. \u003cem\u003eRhodococcus\u003c/em\u003e and \u003cem\u003eNocardia\u003c/em\u003e, where strong GC\u0026ndash;TTA correlations exist and TTA codon usage is high, may use the TTA codon less prominently as a dedicated regulatory element, with its occurrence more strongly influenced by local sequence context or overall nucleotide composition. Nevertheless, given that these organisms still encode only a single dedicated \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{t}\\text{R}\\text{N}\\text{A}}_{\\text{U}\\text{A}\\text{A}}^{\\text{L}\\text{e}\\text{u}}\\)\u003c/span\u003e\u003c/span\u003e (Additional file 3: Table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e), TTA codon usage may remain relevant at the translational level, potentially affecting translation efficiency or timing in specific genes despite the absence of a broader regulatory mechanism.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eRare codons are enriched at the 5' termini of genes\u003c/h3\u003e\n\u003cp\u003eAs noted in the introduction, the position of codons within a gene can influence translation initiation and protein folding. We therefore analyzed the positional distribution of all codons. For comparability across genes of varying lengths, each coding sequence was divided into ten equal-sized segments (bins 0\u0026ndash;9) across the entire length of the gene. Bin 0 corresponds to the bin closest to the 5' terminus while bin 9 corresponds to the bin closest to the 3' terminus. For each codon, we calculated the number of times it fell within each bin relative to its total count in the gene. Codon rarity was defined as the square root transformation of the inverse of the total codon counts across all positional bins, normalized to range from 0 (least rare) to 1 (most rare). This transformation reduces the skew caused by highly abundant codons and provides a more balanced scale for comparing codons of different overall frequencies. To quantify the relationship between codon rarity and positional enrichment, we calculated both Spearman and Pearson correlations (Additional file 3: Tables S3 and S4) between codon rarity and normalized frequencies in each bin.\u003c/p\u003e\u003cp\u003eUsing this rarity metric, we next examined how codons were distributed along genes. Across all genera, most codons with higher usage were distributed relatively uniformly across the entire gene length. In contrast, rarer codons exhibited a marked enrichment in bin 0 and a weaker enrichment in bin 9 (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). While both correlation measures showed similar trends, Spearman coefficients were prioritized for interpretation because they do not assume a linear relationship between variables (Additional file 3: Tables S3 and S4). Across all genera, the strongest positive associations between codon rarity and positional frequency occurred in bin 0 (Spearman\u0026thinsp;=\u0026thinsp;0.69\u0026ndash;0.84), confirming the pronounced enrichment of rare codons at the 5\u0026prime; terminus (Fig.\u0026nbsp;5). In contrast, bins 1\u0026ndash;8 showed a marked depletion of rare codons, reflected in consistently negative correlations (typically \u0026minus;\u0026thinsp;0.5 to \u0026minus;\u0026thinsp;0.8), with the most pronounced values in bins 3\u0026ndash;7. This pattern was consistent across genera, interrupted only by a few instances of negative correlations close to 0 (e.g., bin 1 in \u003cem\u003eStreptomyces\u003c/em\u003e and \u003cem\u003eKitasatospora\u003c/em\u003e). Correlations increased again in bin 9 (0.42\u0026ndash;0.79), indicating a secondary but weaker enrichment at the 3\u0026prime; terminus (Additional File 2: Fig. S4). Together, these results reveal a U-shaped positional bias: rare codons preferentially cluster at gene ends and are underrepresented in the central coding region. This positional bias in rare codons suggests they may play regulatory roles at gene termini, influencing translation dynamics or timing.\u003c/p\u003e\u003cp\u003eWe next focused specifically on the TTA and AGA codons. We included AGA because, like TTA, it is a rare codon and showed strong enrichment at the 5' termini in several genera (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). Heatmaps of TTA codon frequencies across bins revealed genus-specific patterns (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e6\u003c/span\u003eA). Strong enrichment of TTA at the 5' termini was observed in \u003cem\u003eStreptomyces\u003c/em\u003e, \u003cem\u003eKitasatospora\u003c/em\u003e and \u003cem\u003eSaccharothrix\u003c/em\u003e, consistent with known regulatory roles of TTA in \u003cem\u003eStreptomyces\u003c/em\u003e. In these genera, TTA likely acts as a translational checkpoint regulated by \u003cem\u003ebldA\u003c/em\u003e [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. Mechanistically, 5' termini enrichment of rare codons such as TTA may delay early elongation when the corresponding tRNA is scarce, thereby restricting translation until specific developmental or environmental signals induce \u003cem\u003ebldA\u003c/em\u003e expression.\u003c/p\u003e\u003cp\u003eIn contrast, genera such as \u003cem\u003eNocardia\u003c/em\u003e, \u003cem\u003eRhodococcus\u003c/em\u003e and \u003cem\u003ePseudonocardia\u003c/em\u003e exhibited a more uniform internal distribution of TTA codons despite their relatively high percentage of TTA-containing genes. This pattern suggests that, in these genera, TTA may play a regulatory role through its overall prevalence rather than through preferential positioning at specific gene termini. Other genera, including \u003cem\u003eSaccharopolyspora\u003c/em\u003e, \u003cem\u003eActinokineospora\u003c/em\u003e, \u003cem\u003eAmycolatopsis\u003c/em\u003e and \u003cem\u003eKribbella\u003c/em\u003e, displayed moderate enrichment at the 5' end but less\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003epronounced than in \u003cem\u003eStreptomyces\u003c/em\u003e. A weaker enrichment of TTA was also noted at the 3' end in certain genera, for example \u003cem\u003ePseudonocardia\u003c/em\u003e, \u003cem\u003eAmycolatopsis\u003c/em\u003e and \u003cem\u003eKribbella\u003c/em\u003e, while others such as \u003cem\u003eRhodococcus\u003c/em\u003e showed no enrichment at either terminus. Rare codon-induced pausing near stop codons has been associated with changes in transcript longevity or ribosome release [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e].\u003cdiv class=\"BlockQuote\"\u003e\u003cp\u003e\u003cb\u003eFigure\u0026nbsp;5\u003c/b\u003e Pearson and Spearman correlation between normalized codon rarity and bin 0 (gene start) enrichment across genera. Each panel corresponds to a different genus. The x-axis indicates normalized rarity (0\u0026ndash;1, with 1 being rarest); the y-axis represents the proportion of each codon found in bin 0. Dashed lines indicate linear regression fits based on Pearson correlation. The five most frequently and five least frequently used codons are labelled.\u003c/p\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eA similar analysis for the AGA codon showed enrichment patterns that mirrored those of TTA in several genera (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e6\u003c/span\u003eC). Strong 5' enrichment of the AGA codon was observed in \u003cem\u003eMicromonospora\u003c/em\u003e, \u003cem\u003eSaccharothrix\u003c/em\u003e, \u003cem\u003eKitasatospora\u003c/em\u003e, Amycolatopsis and \u003cem\u003eStreptomyces\u003c/em\u003e, suggesting that AGA might also contribute to regulation of translation initiation. Although AGA is not decoded by the \u003cem\u003ebldA\u003c/em\u003e-encoded tRNA and instead relies on a distinct tRNA pool, its positional bias indicates it may serve a similar regulatory function. Increasing the availability of tRNA\u003csup\u003eArg\u003c/sup\u003e could potentially enhance the translation of AGA-rich genes, providing a mechanism for modulating their expression. Since a fraction of these AGA-rich genes are located within BGCs (Additional file 2: Fig. S5), changes in tRNA availability could affect secondary metabolism. In \u003cem\u003eSaccharopolyspora\u003c/em\u003e, \u003cem\u003eKribbella\u003c/em\u003e and \u003cem\u003ePseudonocardia\u003c/em\u003e, AGA showed modest enrichment at both termini.\u003c/p\u003e\u003cp\u003eTo complement the positional analysis and assess potential biases introduced by gene length normalization, we also quantified the frequency of TTA and AGA codons within the first and last 10 amino acids of coding sequences (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e6\u003c/span\u003eB, D). This analysis confirmed the positional analysis findings: both TTA and AGA codons were preferentially localized at the 5' end in several genera, particularly \u003cem\u003eStreptomyces\u003c/em\u003e, \u003cem\u003eKitasatospora\u003c/em\u003e and \u003cem\u003eSaccharothrix\u003c/em\u003e.\u003c/p\u003e\u003cp\u003e\u003cb\u003eTTA and AGA codons are enriched in the COG categories Replication, Recombination and Repair and Transcription\u003c/b\u003e\u003c/p\u003e\u003cp\u003eWe next examined the functional categories associated with genes containing rare codons based on Clusters of Orthologous Groups (COG) annotations. Annotations were assigned by eggNOG-mapper version 2 [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e] based on eggNOG database version 5 [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e] and COG classifications [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. For TTA and AGA we defined two gene sets: (i) genes containing at least one instance of the codon anywhere in the coding sequence (gene-wide set), and (ii) genes in which the codon occurs within the first 10% of the coding sequence (5\u0026rsquo; terminus set). Within each strain, we investigated whether TTA and AGA codons were overrepresented in any COG category relative to the genomic background using a hypergeometric test. Significance was assessed at p\u0026thinsp;\u0026lt;\u0026thinsp;0.05. To compare across taxa, we report for each genus the proportion of strains showing significant enrichment in each COG category. The results are provided in Additional file 3, Tables S5-S8 (gene-wide: S5, S7; 5\u0026rsquo; termini: S6, S8)\u003c/p\u003e\u003cp\u003eFor the gene-wide set, TTA codons were most often enriched in categories related to Replication, Recombination and Repair, with very high frequencies in \u003cem\u003eNocardia\u003c/em\u003e (90%), \u003cem\u003eAmycolatopsis\u003c/em\u003e (87%) and \u003cem\u003ePseudonocardia\u003c/em\u003e (86%) (Additional file 3: Table S5). Enrichment in Transcription was also seen in \u003cem\u003eNocardia\u003c/em\u003e (68%), \u003cem\u003eRhodococcus\u003c/em\u003e (59%) and \u003cem\u003eStreptomyces\u003c/em\u003e (58%). The category Secondary Metabolism was enriched in several genera, for example \u003cem\u003eActinokineospora\u003c/em\u003e (73%) and \u003cem\u003eNocardia\u003c/em\u003e (55%). Although TTA is known to affect secondary metabolism in \u003cem\u003eStreptomyces\u003c/em\u003e, only 25% of \u003cem\u003eStreptomyces\u003c/em\u003e strains showed enrichment of TTA within this category. A survey of 213 \u003cem\u003eStreptomyces\u003c/em\u003e genomes reported a similar trend for the TTA codon: enrichment in Replication, Recombination and Repair, Transcription but also Secondary Metabolism [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. Similarly, AGA showed strong enrichment in Replication, Recombination and Repair (e.g., \u003cem\u003eKitasatospora\u003c/em\u003e, 97%; \u003cem\u003eStreptomyces\u003c/em\u003e, 94%) and Transcription (e.g., \u003cem\u003eRhodococcus\u003c/em\u003e, 90%; \u003cem\u003eNocardia\u003c/em\u003e, 90%) (Additional file 3: Table S7). However, unlike TTA, AGA enrichment was detected in a higher proportion of strains across multiple genera, indicating that its functional associations in these two categories is more broadly conserved. In Secondary Metabolism, AGA showed lower enrichment than for TTA: the average among all genera was 29% for TTA versus 22% for AGA. Although enrichment of AGA in Secondary Metabolism was less pronounced than that observed for TTA it was still enriched in several genera, particularly \u003cem\u003eNocardia\u003c/em\u003e (67%) and \u003cem\u003eRhodococcus\u003c/em\u003e (43%).\u003c/p\u003e\u003cp\u003eFor the TTA 5\u0026rsquo; terminus set (Additional file 3: Table S6), Cell Wall, Membrane and Envelope Biogenesis was the most frequently enriched COG category across strains. This effect was pronounced in \u003cem\u003eKribbella\u003c/em\u003e (95%) and \u003cem\u003eActinokineospora\u003c/em\u003e (91%). The Secondary Metabolism category also showed high 5' terminus enrichment with all \u003cem\u003eSaccharothrix\u003c/em\u003e and 74% of \u003cem\u003eKitasatospora\u003c/em\u003e strains showing enrichment. In contrast, analysis of the AGA codon (Additional file 3: Table S8) revealed 5\u0026rsquo; terminus enrichment in nearly all strains spanning many genera and functional categories. For example, genes related to Amino Acid and Carbohydrate Metabolism, Cell Wall Biogenesis and Secondary Metabolism were enriched in \u0026gt;\u0026thinsp;90% of strains in nearly all genera.\u003c/p\u003e\u003cp\u003eAlthough TTA and AGA differ in the proportion of strains showing enrichment in their genus-level distribution, they occur in largely overlapping functional categories. Both codons are consistently enriched in Replication, Recombination and Repair, Transcription and Secondary Metabolism, with AGA showing a particularly strong signal in Transcription. Enrichment at the 5\u0026prime; terminus suggests that both codons contribute to translational control, complementing transcriptional regulation. Notably, genus-specific patterns (e.g., gene-wide AGA enrichment in \u003cem\u003eRhodococcus\u003c/em\u003e and \u003cem\u003eNocardia\u003c/em\u003e, but absence in \u003cem\u003eActinokineospora\u003c/em\u003e and \u003cem\u003eAmycolatopsis\u003c/em\u003e) highlight possible lineage-specific regulatory roles.\u003c/p\u003e\n\u003ch3\u003eantiSMASH based BGC annotation reveals greater AGA enrichment compared to TTA\u003c/h3\u003e\n\u003cp\u003eIn addition to COG, we employed antiSMASH 8.0 to predict the BGCs within each strain and to obtain more detailed information about the genes within the BGCs [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]. We then examined the occurrence of TTA and AGA codons within BGCs. For each strain, we first calculated the proportion of BGCs containing at least one gene with TTA or AGA, representing the overall prevalence of these two codons at the BGC level (Fig. S5). TTA codons (Fig. S5A) were found in BGCs of nearly all genera but at varying frequency. High prevalence was observed in \u003cem\u003eNocardia\u003c/em\u003e, \u003cem\u003eRhodococcus\u003c/em\u003e and \u003cem\u003eKribbella\u003c/em\u003e and lower prevalence in \u003cem\u003eKitasatospora\u003c/em\u003e and \u003cem\u003ePseudonocardia\u003c/em\u003e. In contrast, AGA codons (Fig. S5B) showed less variability, displaying a more uniform and generally higher prevalence than TTA.\u003c/p\u003e\u003cp\u003eTo assess whether BGCs contain more rare codons than expected given the genomic background, we applied a hypergeometric test to each genome (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e7\u003c/span\u003eA). Both TTA and AGA codons were significantly overrepresented in BGCs, but the proportion of strains with significant enrichment differed between each genus. AGA enrichment was highest in \u003cem\u003eKribbella\u003c/em\u003e (91% of strains) and \u003cem\u003eActinokineospora\u003c/em\u003e (82%), whereas TTA was highest in \u003cem\u003eStreptomyces\u003c/em\u003e (71%) and \u003cem\u003eKitasatospora\u003c/em\u003e (70%). For comparison, when using COG-based annotations the enrichment was detected in only 25% of \u003cem\u003eStreptomyces\u003c/em\u003e genomes, highlighting that antiSMASH provides a more accurate framework for studying secondary metabolism. Some genera displayed opposite codon preferences: in \u003cem\u003eKribbella\u003c/em\u003e, AGA was highly enriched (91%) while TTA was much less frequent (28%), whereas in \u003cem\u003eStreptomyces\u003c/em\u003e TTA enrichment (71%) exceeded AGA (57%). The results differed markedly depending on whether enrichment in secondary metabolism genes was assessed using COG or antiSMASH. With COG, enrichment was detected in an average of 29% of strains among all genera for TTA and 22% for AGA, whereas antiSMASH based annotations yielded higher values of 37% for TTA and 51% for AGA. This discrepancy highlights that COG functional classification does not fully capture BGC information, and that specialized tools such as antiSMASH provide a more accurate representation of secondary metabolism. Notably, while COG-based analyses suggested that TTA was more strongly enriched, antiSMASH results indicate that AGA enrichment is even more pronounced, underscoring its potential importance in the regulation of secondary metabolism. Values for AGA and TTA enrichment across genera are provided in Additional file 3, Table S9.\u003c/p\u003e\u003cp\u003eTo resolve which functional gene categories within BGCs drive these signals, we applied Fisher\u0026rsquo;s exact test for each genome, comparing codon presence versus absence in each antiSMASH annotation category (biosynthetic, biosynthetic-additional, regulatory, resistance, transport, and others). For both codons, biosynthetic genes showed the highest overall enrichment (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e7\u003c/span\u003eB\u0026ndash;C). Among biosynthetic genes, AGA showed the highest enrichment in \u003cem\u003eNocardia\u003c/em\u003e (81% of strains) and \u003cem\u003eRhodococcus\u003c/em\u003e (78%). TTA was also enriched in these genera, but generally at lower frequencies; in \u003cem\u003eNocardia\u003c/em\u003e, however, TTA reached levels comparable to AGA. Regulatory genes displayed a distinct pattern: TTA codons were frequently enriched in this category, with up to 45% of \u003cem\u003eKitasatospora\u003c/em\u003e and 43% of \u003cem\u003eStreptomyces\u003c/em\u003e strains showing significance, whereas AGA codons were less often enriched in regulatory genes in these taxa. However, in other genera, including \u003cem\u003eMicromonospora\u003c/em\u003e, \u003cem\u003eAmycolatopsis\u003c/em\u003e and \u003cem\u003eActinokineospora\u003c/em\u003e, AGA enrichment in regulatory genes exceeded that of TTA.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eTogether, these results demonstrate that rare codons are not randomly distributed within BGCs, with both genus-specific and functional-category\u0026ndash;specific patterns. TTA\u003c/p\u003e\u003cp\u003eenrichment in regulatory genes supports the established model in which rare codons act as translational checkpoints to control BGC activation, particularly in \u003cem\u003eStreptomyces\u003c/em\u003e and \u003cem\u003eKitasatospora\u003c/em\u003e. At the same time, the consistent presence of both codons in biosynthetic core genes indicates an additional role in tuning enzyme expression, potentially affecting pathway efficiency, flux, and timing. Importantly, the enrichment of AGA codons in regulatory genes in several genera highlights that codon-mediated regulation is more diverse than previously recognized and not restricted to TTA, contributing to the regulation and fine-tuning of BGC expression across Actinomycetota. Such lineage-specific reliance on codon usage could help explain why BGC activation varies widely across taxa and why many natural products remain cryptic under standard laboratory conditions.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study presents a systematic analysis of codon usage patterns in 1936 high- and medium-quality Actinomycetota genomes across 11 genera. We show that the proportion of TTA-containing genes is strongly associated with genomic GC content, with lower GC genomes generally carrying more TTA codons. \u003cem\u003eStreptomyces\u003c/em\u003e is a notable exception\u0026mdash;despite its high GC content, it retains TTA codons in key regulatory genes, reflecting preservation for specific, functional usage rather than compositional bias.\u003c/p\u003e\u003cp\u003eBeyond compositional trends, we observed a clear overall pattern that rarer codons are preferentially enriched at the 5' termini of genes. Both TTA and AGA codons were most consistently concentrated within the first 10% of coding sequences (bin 0), highlighting this region as a hotspot for translational control. Within this shared trend, codon-specific differences emerged: AGA codons showed enrichment at gene starts across multiple genera, suggesting a broadly conserved role in modulating translation initiation. In contrast, TTA codon enrichment at the 5' end was more genus-specific, with strong signals in taxa such as \u003cem\u003eKitasatospora\u003c/em\u003e, \u003cem\u003eSaccharothrix\u003c/em\u003e and \u003cem\u003eStreptomyces\u003c/em\u003e but weaker in \u003cem\u003eNocardia\u003c/em\u003e and \u003cem\u003eRhodococcus\u003c/em\u003e.\u003c/p\u003e\u003cp\u003eFunctional enrichment analysis using COG categories further revealed that both TTA and AGA codons are associated with cellular processes such as Replication, Recombination and Repair, Transcription, Secondary Metabolism and Cell Wall Biogenesis. While the two codons often overlapped in enriched categories, their strengths and distributions differed. This observation suggests that AGA, like TTA, may play a broader role in regulatory tuning across Actinomycetota. Within BGCs, both codons were significantly overrepresented compared to the genomic background, with the strongest enrichment observed in biosynthetic genes, followed by regulatory genes. Overall, AGA codons tended to show stronger enrichment in biosynthetic genes, particularly in genera such as \u003cem\u003eRhodococcus\u003c/em\u003e and \u003cem\u003eAmycolatopsis\u003c/em\u003e, whereas in \u003cem\u003eNocardia\u003c/em\u003e both codons reached similarly high levels. In contrast, TTA codons were more frequently enriched in \u003cem\u003eStreptomyces\u003c/em\u003e and \u003cem\u003eKitasatospora\u003c/em\u003e, supporting their role as translational checkpoints in these taxa. In several other genera, including \u003cem\u003eMicromonospora\u003c/em\u003e, \u003cem\u003eActinokineospora\u003c/em\u003e, and \u003cem\u003eNocardia\u003c/em\u003e, enrichment of AGA codons exceeded that of TTA in regulatory genes of BGCs. Together, these patterns indicate that both codons contribute to the fine-tuning of secondary metabolism.\u003c/p\u003e\u003cp\u003eThis work lays the foundation for several future directions. Experimental validation using ribosome profiling or codon-specific reporter constructs could confirm the functional consequences of TTA and AGA enrichment at the 5' and 3' termini. Engineering strategies for heterologous expression of biosynthetic gene clusters may benefit from codon optimization approaches that account for both positional and phylogenetic codon usage biases. Finally, integrating codon usage with analyses of horizontal gene transfer could clarify how rare codons contribute to genome plasticity and the modular regulation of secondary metabolism. In practical terms, when expressing BGCs, either in their native host or in a heterologous chassis, special attention should be given to rare codons within the first 10% of the coding sequence, as this region is most likely to influence translation initiation and overall expression efficiency.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003eData collection and quality check\u003c/h2\u003e\u003cp\u003eGenome sequences for 11 genera within the Actinomycetota phylum were downloaded from the NCBI RefSeq database on November 13, 2024. To ensure data quality and taxonomic consistency, several filtering steps were applied. First, genomes with failed Average Nucleotide Identity (ANI) checks of taxonomy were removed from the dataset. Subsequently, the genomes were categorized into three quality groups based on assembly completeness, contamination levels, and the number of contigs. High-quality (HQ) genomes included those annotated as complete or at the chromosome level by NCBI. Medium-quality (MQ) genomes were defined as those with completeness\u0026thinsp;\u0026gt;\u0026thinsp;90%, contamination\u0026thinsp;\u0026lt;\u0026thinsp;5% based on CheckM version 1.2.2 [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e], and fewer than 50 contigs. Low-quality (LQ) genomes, which did not meet these criteria, were excluded from further analysis. Genera were selected for downstream analysis based on the availability of at least 10 high- and/or medium-quality genomes, ensuring robust comparative analyses (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). For the genus \u003cem\u003eStreptomyces\u003c/em\u003e, only high-quality genomes were used since this genus was highly overrepresented among Actinomycetota. Following these filters, the final dataset comprised 1936 genomes. The corresponding accession IDs and metadata for all included genomes are provided in Additional file 1, Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eSummary of genomes included in this study. HQ: High Quality, MQ: Medium Quality\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGenus\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHQ\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eMQ\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eTotal\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cem\u003eActinokineospora\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e11\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cem\u003eAmycolatopsis\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e26\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e5034\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e60\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cem\u003eKitasatospora\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e37\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e7740\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e77\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cem\u003eKribella\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e11\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e46\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e57\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cem\u003eMicromonospora\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e71\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e113\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e184\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cem\u003eNocardia\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e36\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e127\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e163\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cem\u003ePseudonocardia\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e13\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e14\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cem\u003eRhodococcus\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e108\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e78\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e186\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cem\u003eSaccharopolyspora\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e16\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e6\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e22\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cem\u003eSaccharothrix\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e11\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e15\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cem\u003eStreptomyces\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e1147\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eNone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1147\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eRepresentative genome selection using MASH\u003c/h3\u003e\n\u003cp\u003eTo identify representative genomes among duplicates, pairwise genomic distances were calculated using MASH [\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e], and genomes with distances below a specified threshold (0.001) were considered duplicates. A graph-based approach was employed where each genome was represented as a node, and edges were drawn between nodes with pairwise distances below the threshold. The Louvain community detection algorithm [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e] was applied to group genomes into clusters based on their connectivity within the graph. For each cluster, a representative genome was selected. In clusters containing only one genome, that genome was directly chosen as the representative. For clusters with multiple genomes, the genome with the smallest average pairwise distance to all others in the cluster was selected, ensuring that the representative genome was the most central and typical within the group.\u003c/p\u003e\n\u003ch3\u003ePhylogenetic tree construction using Getphylo\u003c/h3\u003e\n\u003cp\u003eWe used the getphylo pipeline to construct a phylogenetic tree of the analyzed genomes based on 23 conserved single copy loci [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e] and visualized the resulting tree using the Interactive Tree of Life (iTOL) [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]. The tree was rooted at the most recent common ancestor of the genus \u003cem\u003eKribbella\u003c/em\u003e, which was selected based on its phylogenetic placement as a basal lineage in Actinomycetota [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e] and to enable clearer interpretation of codon usage evolution across derived clades.\u003c/p\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003eCodon usage data extraction\u003c/h2\u003e\u003cp\u003eAnnotated genomic data were processed to extract codon frequencies and their positional distribution within genes. The coding DNA sequences (CDSs) were retrieved from the GenBank files of the curated genomes. Relative codon positions were calculated by dividing the CDS length into 10 equal bins (referred to as position bins 0\u0026ndash;9), with bin 0 representing the 5\u0026rsquo; start of the gene and bin 9 representing the 3\u0026rsquo; end. Codon counts for each positional bin were computed for all CDSs in each genome. Codon frequency normalization was performed by calculating the proportion of each codon within its respective bin relative to the total codon counts across all bins for each genus. The number of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{t}\\text{R}\\text{N}\\text{A}}_{\\text{U}\\text{A}\\text{A}}^{\\text{L}\\text{e}\\text{u}}\\)\u003c/span\u003e\u003c/span\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{t}\\text{R}\\text{N}\\text{A}}_{\\text{U}\\text{C}\\text{U}\\:}^{\\text{A}\\text{r}\\text{g}}\\)\u003c/span\u003e\u003c/span\u003e genes, responsible for decoding the rare codons TTA and AGA respectively, was determined by parsing GenBank files for each strain and identifying tRNA features annotated with the corresponding anticodons in the anticodon qualifiers.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003eCodon rarity calculation\u003c/h2\u003e\u003cp\u003eCodon rarity was calculated to quantify the infrequency of each codon across all positional bins for each genus. First, the total counts of each codon across all bins were counted. Rarity was then calculated using a square root transformation of the inverse of the total codon counts:\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:\\text{R}\\text{a}\\text{r}\\text{i}\\text{t}\\text{y}=\\:\\sqrt{\\frac{1}{\\text{T}\\text{o}\\text{t}\\text{a}\\text{l}\\:\\text{C}\\text{o}\\text{u}\\text{n}\\text{t}\\text{s}}}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eThe square root transformation was applied to reduce the skewness inherent in the distribution of codon counts, thereby emphasizing differences between rarer codons while maintaining the overall rarity trend. While the transformation may not substantially alter correlation outcomes compared to raw codon frequencies, it helps normalize extreme count differences and improve interpretability across codons with very different usage levels. To enable comparability across genera, min-max normalization was subsequently applied:\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:\\text{N}\\text{o}\\text{r}\\text{m}\\text{a}\\text{l}\\text{i}\\text{z}\\text{e}\\text{d}\\:\\text{R}\\text{a}\\text{r}\\text{i}\\text{t}\\text{y}=\\:\\frac{\\text{R}\\text{a}\\text{r}\\text{i}\\text{t}\\text{y}-\\text{m}\\text{i}\\text{n}\\left(\\text{R}\\text{a}\\text{r}\\text{i}\\text{t}\\text{y}\\right)}{\\text{max}\\left(\\text{R}\\text{a}\\text{r}\\text{i}\\text{t}\\text{y}\\right)-\\text{m}\\text{i}\\text{n}\\left(\\text{R}\\text{a}\\text{r}\\text{i}\\text{t}\\text{y}\\right)}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eThis normalization scaled the rarity values between 0 (least rare codon) and 1 (most rare codon). The normalized rarity values were subsequently used in all correlation analyses and visualizations to explore positional biases and relationships with codon enrichment across the genome.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003eCorrelation analysis between codon rarity and positional enrichment\u003c/h2\u003e\u003cp\u003ePearson and Spearman correlation coefficients were calculated between the normalized codon rarity values and their respective frequencies in each position bin (0\u0026ndash;9). Correlation coefficients were computed separately for each genus. The correlations were visualized through scatter plots with trendlines, highlighting the enrichment of rare codons in specific bins. Correlation trends across bins were summarized for all genera in tabular format, and genus-specific patterns were further analyzed for notable differences in codon behavior.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003eCodon enrichment analyses in functional categories and BGCs\u003c/h2\u003e\u003cp\u003eGene products were assigned to functional categories using the eggNOG-mapper software version 2 [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e], based on the EggNOG database version 5 [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e] and COG categories [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. To assess the statistical enrichment of genes containing rare codons, we employed hypergeometric tests separately for TTA and AGA codons. These tests determine whether codon-containing genes are overrepresented in specific COG categories compared to the overall gene distribution in each genome.\u003c/p\u003e\u003cp\u003eFor each strain, the probability of observing at least k codon-containing genes in each COG category was computed using the survival function of the hypergeometric distribution:\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$$\\:P(X\\ge\\:k)=1-\\sum\\:_{i=0}^{k-1}\\frac{\\left(\\begin{array}{c}K\\\\\\:i\\end{array}\\right)\\left(\\begin{array}{c}N-K\\\\\\:n-i\\end{array}\\right)}{\\left(\\begin{array}{c}N\\\\\\:n\\end{array}\\right)}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eN\u003c/em\u003e is the total number of genes in the genome,\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eK\u003c/em\u003e is the total number of genes assigned to the given COG category,\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eN\u003c/em\u003e is the total number of TTA- or AGA-containing genes in the genome,\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003ek\u003c/em\u003e is the number of TTA- or AGA-containing genes observed in the COG category.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eThis test was applied separately to each genome, ensuring that the results reflect the biological variability across different strains. Categories with a \u003cem\u003ep\u003c/em\u003e-value below 0.05 were considered significantly enriched for TTA- or AGA-containing genes.\u003c/p\u003e\u003cp\u003eTo assess whether rare codons are preferentially located at the beginning of genes within specific functional categories, a separate hypergeometric test was performed using only genes containing codons at position bin 0 (corresponding to the first 10% of the coding sequence). For each strain and each COG category, the number of genes with TTA or AGA codons located specifically in bin 0 was counted and compared to the total number of codon-containing genes across all bins. The test evaluated whether the presence of TTA or AGA codons in bin 0 was significantly overrepresented within a given COG category, compared to a background of all codon-containing genes. The same statistical formulation was used as in the overall enrichment analysis, with the adjusted counts reflecting positional restriction to bin 0. This analysis was performed for both TTA and AGA, thereby distinguishing functional categories where rare codons are not only enriched in general but also preferentially positioned at the start of genes.\u003c/p\u003e\u003cp\u003eAn analogous enrichment analysis was performed for BGCs to test whether the presence or rare codons in non-randomly associated with secondary metabolism. For each strain, antiSMASH 8.0 predictions were used to identify BGC genes, and the hypergeometric test was applied to determine whether genes containing TTA or AGA codons are statistically overrepresented within BGCs compared to the rest of the genome [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]. Here, K corresponded to the total number of genes assigned to BGCs, and k to the number of rare-codon genes observed in BGCs.\u003c/p\u003e\u003cp\u003eTo assess enrichment of rare codons within functional gene categories inside BGCs, we applied Fisher\u0026rsquo;s exact test. For each strain and each antiSMASH functional category (biosynthetic, biosynthetic-additional, regulatory, resistance, transport, unknown, and others), a 2 \u0026times; 2 contingency table was constructed to compare codon-containing and codon-lacking genes inside the category versus all other BGC categories (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e):\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eContingency table for Fisher\u0026rsquo;s test.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCodon +\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eCodon -\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGenes in category\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e\u003cem\u003ea\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cem\u003eb\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGenes in other BGC categories\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e\u003cem\u003ec\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cem\u003ed\u003c/em\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"BlockQuote\"\u003e\u003cp\u003ewhere:\u003c/p\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003ea\u003c/em\u003e is the number of genes in the category that contain at least one TTA or AGA codon,\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eb\u003c/em\u003e is the number of genes in the category that do not contain the codon,\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003ec\u003c/em\u003e is the number of codon-containing genes in all other BGC categories,\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003ed\u003c/em\u003e is the number of genes without the codon in all other BGC categories.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eThe test evaluates whether the proportion of TTA or AGA codon-containing genes in a given functional category is significantly greater than expected relative to all other BGC genes. Results were summarized at the genus level as the percentage of strains showing significant enrichment (\u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05, odds ratio\u0026thinsp;\u0026gt;\u0026thinsp;1) for TTA and AGA separately.\u003c/p\u003e\u003cp\u003eAll enrichment calculations were performed using the hypergeom.sf and fisher exact functions from the SciPy statistical package in Python [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003cp\u003eNot applicable.\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003cp\u003eNot applicable.\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003ch2\u003eCompeting interests\u003c/h2\u003e\u003cp\u003eAuthors declare no competing interests.\u003c/p\u003e\u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e\u003cp\u003eThis work was funded by the Novo Nordisk Foundation through the Center for Biosustainability at the Technical University of Denmark (NNF Grant Number: NNF20CC0035580). T.J.B. and S.E.W. acknowledge support from Novo Nordisk Foundation Postdoctoral Fellowships (NNF Grant Numbers: NNF22OC0078997, NNF22OC0079021). H.U.K. acknowledges support from the National Research Foundation funded by the Korean government (RS-2024-00352229).\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eP.C. and A.R. initiated the study. A.R. performed computational analyses and prepared the figures. T.J.B. and P.C. provided analytical guidance and contributed to data interpretation. S.E.W, K.B., H.U.K. and T.W. contributed to biological interpretation and manuscript revision. All authors approved the final manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThe authors thank Simon Shaw for assistance with antiSMASH analyses and colleagues from DTU Biosustain, Natural Product Genome Mining group for helpful discussions.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eAll data is available as supplementary information in Additional Files 1 to 3. Table S1 contains the accessions of all genomes used in the study. The analysis for this manuscript is available at GitHub link https://github.com/arudenko-2025/rare-codons-actinomycetota [50] and the version of code is also deposited at Zenodo https:/doi.org/10.5281/zenodo.17285473 [51].\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eChater KF, Biro S, Lee KJ, Palmer T, Schrempf H. The complex extracellular biology of Streptomyces. FEMS Microbiology Reviews. 2010 3;34(2):171\u0026ndash;98. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/j.1574-6976.2009.00206.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1574-6976.2009.00206.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChater KF, Chandra G. The use of the rare UUA codon to define expression space for genes involved in secondary metabolism, development and environmental adaptation in Streptomyces. Journal of Microbiology. 2008 2;46(1):1\u0026ndash;11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s12275-007-0233-1\u003c/span\u003e\u003cspan address=\"10.1007/s12275-007-0233-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu H, Li J, Singh BK. Harnessing co-evolutionary interactions between plants and Streptomyces to combat drought stress. Nature Plants 2024 10:8. 2024 7;10(8):1159\u0026ndash;1171. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41477-024-01749-1\u003c/span\u003e\u003cspan address=\"10.1038/s41477-024-01749-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGoodfellow M, Williams ST. Ecology of actinomycetes. Annu Rev Microbiol. 1983;37:189\u0026ndash;216. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1146/annurev.mi.37.100183.001201\u003c/span\u003e\u003cspan address=\"10.1146/annurev.mi.37.100183.001201\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eStach JEM, Bull AT. Estimating and comparing the diversity of marine actinobacteria. Antonie van Leeuwenhoek, International Journal of General and Molecular Microbiology. 2005 1;87(1):3\u0026ndash;9. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/S10482-004-6524-1\u003c/span\u003e\u003cspan address=\"10.1007/S10482-004-6524-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJ\u0026oslash;rgensen TS, Mohite OS, Sterndorff EB, Alvarez-Arevalo M, Blin K, Booth TJ, et al. A treasure trove of 1034 actinomycete genomes. Nucleic Acids Res. 2024;7(13):7487\u0026ndash;503. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/NAR/GKAE523\u003c/span\u003e\u003cspan address=\"10.1093/NAR/GKAE523\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBaltz RH. Gifted microbes for genome mining and natural product discovery. Journal of Industrial Microbiology and Biotechnology. 2017 5;44(4\u0026ndash;5):573\u0026ndash;588. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10295-016-1815-x\u003c/span\u003e\u003cspan address=\"10.1007/s10295-016-1815-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003en der Meij A, Worsley SF, Hutchings MI, van Wezel GP. Chemical ecology of antibiotic production by actinomycetes. FEMS Microbiology Reviews. 2017 5;41(3):392\u0026ndash;416. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/femsre/fux005\u003c/span\u003e\u003cspan address=\"10.1093/femsre/fux005\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOnaka H. Novel antibiotic screening methods to awaken silent or cryptic secondary metabolic pathways in actinomycetes. The Journal of antibiotics. 2017 7;70(8):865\u0026ndash;870. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/JA.2017.51\u003c/span\u003e\u003cspan address=\"10.1038/JA.2017.51\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePlotkin JB, Kudla G. Synonymous but not the same: The causes and consequences of codon bias. Nature Reviews Genetics. 2011 1;12(1):32\u0026ndash;42. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/nrg2899\u003c/span\u003e\u003cspan address=\"10.1038/nrg2899\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eQuax TEF, Claassens NJ, van der Soll D. Codon Bias as a Means to Fine-Tune Gene Expression. Mol Cell. 2015;7(2):149\u0026ndash;61. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/J.MOLCEL.2015.05.035\u003c/span\u003e\u003cspan address=\"10.1016/J.MOLCEL.2015.05.035\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSharp PM, Li WH. The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Research. 1987 2;15(3):1281\u0026ndash;1295. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/NAR/15.3.1281\u003c/span\u003e\u003cspan address=\"10.1093/NAR/15.3.1281\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHershberg R, Petrov DA. Selection on codon bias. Annu Rev Genet. 2008;287\u0026ndash;99. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1146/annurev.genet. 42.110807.091442\u003c/span\u003e\u003cspan address=\"10.1146/annurev.genet. 42.110807.091442\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. 12;42(42, 2008.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLeskiw BK, Bibb MJ, Chater KF. The use of a rare codon specifically during development? Molecular Microbiology. 1991;5(12):2861\u0026ndash;2867. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/j.1365-2958\u003c/span\u003e\u003cspan address=\"10.1111/j.1365-2958\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. 1991.tb01845.x.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChater KF. Streptomyces inside-out: a new perspective on the bacteria that provide us with antibiotics. Philosophical Trans Royal Soc B: Biol Sci. 2006;361(1469):761. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1098/RSTB.2005.1758\u003c/span\u003e\u003cspan address=\"10.1098/RSTB.2005.1758\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSilov S, Zaburannyi N, Anisimova M, Ostash B. The Use of the Rare TTA Codon in Streptomyces Genes: Significance of the Codon Context? Indian Journal of Microbiology. 2020 3;61(1):24. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/S12088-020-00902-6\u003c/span\u003e\u003cspan address=\"10.1007/S12088-020-00902-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNguyen KT, Tenor J, Stettler H, Nguyen LT, Nguyen LD, Thompson CJ. Colonial Differentiation in Streptomyces coelicolor Depends on Translation of a Specific Codon within the adpA Gene. Journal of Bacteriology. 2003 12;185(24):72917296. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1128/JB.185.24.7291-7296.2003\u003c/span\u003e\u003cspan address=\"10.1128/JB.185.24.7291-7296.2003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOhnishi Y, Yamazaki H, Kato JY, Tomono A, Horinouchi S. AdpA, a Central Transcriptional Regulator in the A-Factor Regulatory Cascade That Leads to Morphological Development and Secondary Metabolism in Streptomyces griseus. Bioscience, Biotechnology, and Biochemistry. 2005 1;69(3):431\u0026ndash;439. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1271/BBB.69.431\u003c/span\u003e\u003cspan address=\"10.1271/BBB.69.431\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWhite J, Bibb M. bldA Dependence of Undecylprodigiosin Production in Streptomyces coelicolor A3(2) Involves a Pathway-Specific Regulatory Cascade. J Bacteriol. 1997;179(3):627\u0026ndash;33. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1128/jb.179.3.627-633.1997\u003c/span\u003e\u003cspan address=\"10.1128/jb.179.3.627-633.1997\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRebets YV, Ostash BO, Fukuhara M, Nakamura T, Fedorenko VO. Expression of the regulatory protein LndI for landomycin E production in Streptomyces globisporus 1912 is controlled by the availability of tRNA for the rare UUA codon. FEMS Microbiology Letters. 2006 3;256(1):30\u0026ndash;7. https://doi.org/10.1111/J. 1574-6968.2005.00087.X\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKoshla O, Lopatniuk M, Rokytskyy I, Yushchuk O, Dacyuk Y, Fedorenko V, et al. Properties of Streptomyces albus J1074 mutant deficient in \u003cdiv id=\"IEq5\" class=\"InlineEquation\"\u003e\u003cdiv format=\"TEX\" class=\"mathinline\" id=\"FileID_IEq5\" name=\"EquationSource\"\u003e\u003cscript type=\"math/tex; mode=inline\"\u003e\\:{\\text{t}\\text{R}\\text{N}\\text{A}}_{\\text{U}\\text{A}\\text{A}}^{\\text{L}\\text{e}\\text{u}}\u003c/script\u003e\u003c/div\u003e\u003c/div\u003e gene bldA. Arch Microbiol. 2017;10(8):1175\u0026ndash;83. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s00203-017-1389-7\u003c/span\u003e\u003cspan address=\"10.1007/s00203-017-1389-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLee N, Kim W, Kim JH, Lee Y, Hwang S, Kim G et al. Regulatory orchestration of FK506 biosynthesis in Streptomyces tsukubaensis NRRL 18488 revealed through systematic analysis. iScience. 2025 5;p. 112698. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/J.ISCI.2025.112698\u003c/span\u003e\u003cspan address=\"10.1016/J.ISCI.2025.112698\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKudla G, Murray AW, Tollervey D, Plotkin JB. Coding-sequence determinants of expression in Escherichia coli. Science. 2009;4(5924):255\u0026ndash;8. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1126/science.1170160\u003c/span\u003e\u003cspan address=\"10.1126/science.1170160\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTuller T, Waldman YY, Kupiec M, Ruppin E. Translation efficiency is determined by both codon bias and folding energy. Proc Natl Acad Sci USA. 2010;2(8):3645\u0026ndash;50. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1073/pnas.0909910107\u003c/span\u003e\u003cspan address=\"10.1073/pnas.0909910107\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSauert M, Temmel H, Moll I. Heterogeneity of the translational machinery: Variations on a common theme. Biochimie. 2015;6:114:39\u0026ndash;47. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.biochi.2014.12.011\u003c/span\u003e\u003cspan address=\"10.1016/j.biochi.2014.12.011\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGingold H, Pilpel Y. Determinants of translation efficiency and accuracy. Mol Syst Biol. 2011;7. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/msb.2011.14\u003c/span\u003e\u003cspan address=\"10.1038/msb.2011.14\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHershberg R, Petrov DA. General rules for optimal codon choice. PLoS Genet. 2009;7(7). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pgen.1000556\u003c/span\u003e\u003cspan address=\"10.1371/journal.pgen.1000556\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePothier JF, Bolt V, Arn F, Frasson D, Rhyner N, Sievers M. High-Quality Draft Genome Sequence of Streptomyces albidoflavus CCOS 2040, Isolated from a Swiss Soil Sample. Microbiology Resource Announcements. 2023 3;12(3):01225\u0026ndash;22. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1128/MRA.01225-22\u003c/span\u003e\u003cspan address=\"10.1128/MRA.01225-22\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZaburannyi N, Rabyk M, Ostash B, Fedorenko V, Luzhetskyy A. Insights into naturally minimised Streptomyces albus J1074 genome. BMC Genomics. 2014;2(1):1\u0026ndash;11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/1471-2164-15-97/FIGURES/6\u003c/span\u003e\u003cspan address=\"10.1186/1471-2164-15-97/FIGURES/6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBaltz RH. Streptomyces and Saccharopolyspora hosts for heterologous expression of secondary metabolite gene clusters. J Ind Microbiol Biotechnol. 2010;8(8):759\u0026ndash;72. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10295-010-0730-9\u003c/span\u003e\u003cspan address=\"10.1007/s10295-010-0730-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBilyk O, Sekurova ON, Zotchev SB, Luzhetskyy A. Cloning and heterologous expression of the grecocycline biosynthetic gene cluster. PLoS ONE. 2016;7(7):e0158682. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0158682\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0158682\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMyronovskyi M, Rosenkranzer B, Nadmid S, Pujic P, Normand P, Luzhetskyy A. Generation of a cluster-free Streptomyces albus chassis strains for improved heterologous expression of secondary metabolite clusters. Metabolic Eng 2018 9;49:316\u0026ndash;24. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/J.YMBEN.2018.09.004\u003c/span\u003e\u003cspan address=\"10.1016/J.YMBEN.2018.09.004\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBooth TJ, Shaw S, Cruz-Morales P, Weber T. getphylo: rapid and automatic generation of multi-locus phylogenetic trees. BMC Bioinformatics. 2025 1;26(1):21. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12859-025-06035-1\u003c/span\u003e\u003cspan address=\"10.1186/s12859-025-06035-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNouioui I, Carro L, Garcia-Lopez M, Meier-Kolthoff JP, Woyke T, Kyrpides NC et al. Genome-based taxonomic classification of the phylum actinobacteria. Frontiers in Microbiology. 2018 8;9(AUG):355158. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/FMICB\u003c/span\u003e\u003cspan address=\"10.3389/FMICB\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. 2018.02007/XML.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRoy M, Martial A, Ahmad S. Disseminated Nocardia beijingensis Infection in an Immunocompetent Patient. Eur J Case Rep Intern Med 2020 9;7(11). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.12890/2020_001904\u003c/span\u003e\u003cspan address=\"10.12890/2020_001904\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLarkin MJ, De Mot R, Kulakov LA, Nagy I. Applied aspects of Rhodococcus genetics. Antonie van Leeuwenhoek. Int J Gen Mol Microbiol. 1998;74(1\u0026ndash;3):133\u0026ndash;53. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1023/a:1001776500413\u003c/span\u003e\u003cspan address=\"10.1023/a:1001776500413\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLal D, Verma M, Behura SK, Lal R. Codon usage bias in phylum Actinobacteria: relevance to environmental adaptation and host pathogenicity. Res Microbiol. 2016;10(8):669\u0026ndash;77. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.resmic.2016.06.003\u003c/span\u003e\u003cspan address=\"10.1016/j.resmic.2016.06.003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHackl S, Bechthold A. The Gene bldA, a Regulator of Morphological Differentiation and Antibiotic Production in Streptomyces. Arch Pharm. 2015;7(7):455\u0026ndash;62. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/ARDP.201500073\u003c/span\u003e\u003cspan address=\"10.1002/ARDP.201500073\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eClarke TF, Clark PL. Increased incidence of rare codon clusters at 5\u0026rsquo; and 3\u0026rsquo; gene termini: implications for function. BMC Genomics. 2010;118(11). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/1471-2164-11-118\u003c/span\u003e\u003cspan address=\"10.1186/1471-2164-11-118\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCantalapiedra CP, Hernandez-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol Biol Evol. 2021;38(12):5825\u0026ndash;9. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/molbev/msab293\u003c/span\u003e\u003cspan address=\"10.1093/molbev/msab293\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuerta-Cepas J, Szklarczyk D, Heller D, Hernandez-Plaza A, Forslund SK, Cook H, et al. EggNOG 5.0: A hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;147(D1):D309\u0026ndash;14. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/nar/gky1085\u003c/span\u003e\u003cspan address=\"10.1093/nar/gky1085\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28(1):33\u0026ndash;6. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/nar/28.1.33\u003c/span\u003e\u003cspan address=\"10.1093/nar/28.1.33\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNikolaidis M, Hesketh A, Frangou N, Mossialos D, Van de Peer Y, Oliver SG, et al. A panoramic view of the genomic landscape of the genus Streptomyces. Microb Genomics. 2023;9(6). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1099/mgen.0.001028\u003c/span\u003e\u003cspan address=\"10.1099/mgen.0.001028\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBlin K, Shaw S, Vader L, Szenei J, Reitz Z, Augustijn H, et al. antiSMASH 8.0: extended gene cluster detection capabilities and analyses of chemistry, enzymology, and regulation. Nucleic Acids Res. 2025;4. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/nar/gkaf334\u003c/span\u003e\u003cspan address=\"10.1093/nar/gkaf334\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eParks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;7(7):1043\u0026ndash;55. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/gr.186072.114\u003c/span\u003e\u003cspan address=\"10.1101/gr.186072.114\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOndov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, et al. Mash: Fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;6(1). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s13059-016-0997-x\u003c/span\u003e\u003cspan address=\"10.1186/s13059-016-0997-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNguyen LV, Laval JP, Chainais P, Iop A, Blondel VD, Guillaume JL et al. Fast unfolding of communities in large networks. J Stat Mechanics: Theory Exp 2008 10;2008(10):P10008. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.0803.0476\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.0803.0476\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLetunic I, Bork P. Interactive tree of life (iTOL) v5: An online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;7(W1):W293\u0026ndash;6. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/nar/gkab301\u003c/span\u003e\u003cspan address=\"10.1093/nar/gkab301\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVirtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods. 2020 3;17(3):261\u0026ndash;272. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41592-019-0686-2\u003c/span\u003e\u003cspan address=\"10.1038/s41592-019-0686-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRudenko A. Code and analysis for Unraveling Rare Codon Bias in Actinomycetota: Lineage-Specific and 5\u0026rsquo; Terminal Enrichment Across 1936 Genomes. GitHub 2025. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/arudenko-2025/rare-codons-actinomycetota\u003c/span\u003e\u003cspan address=\"https://github.com/arudenko-2025/rare-codons-actinomycetota\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRudenko A. arudenko-2025/rare-codons-actinomycetota: v0.1-preprint. Zenodo. 2025. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.5281/zenodo.17285473\u003c/span\u003e\u003cspan address=\"10.5281/zenodo.17285473\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-genomics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"gics","sideBox":"Learn more about [BMC Genomics](http://bmcgenomics.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/gics","title":"BMC Genomics","twitterHandle":"#BMCGenomics","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Codon Usage Bias, Actinomycetota, Secondary Metabolism, Gene Regulation","lastPublishedDoi":"10.21203/rs.3.rs-7996976/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7996976/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e\u003cp\u003eActinomycetota are a diverse phylum of major ecological, medical, and industrial importance, best known for producing antibiotics and other secondary metabolites. While some regulatory mechanisms of secondary metabolism are understood, many remain unresolved. Codon usage bias, the preferential use of synonymous codons, represents one potential layer of regulation, as it is known to influence translation efficiency and timing. The availability of thousands of high-quality genomes now enables codon usage to be examined across this phylum at unprecedented scale.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e\u003cp\u003eWe analyzed codon usage across 1936 high- and medium-quality genomes from 11 genera. The most common codon across the dataset was GCC, particularly enriched in \u003cem\u003eStreptomyces albidoflavus\u003c/em\u003e. In contrast, TTA was consistently rare and showed variable distribution across genera. AGA was identified as another rare codon with especially strong enrichment at 5\u0026prime; termini. Both TTA and AGA were enriched in functional categories such as replication, transcription, and secondary metabolism, and were significantly overrepresented in biosynthetic gene clusters, particularly within biosynthetic and regulatory genes.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e\u003cp\u003eThese results show that rare codon usage in Actinomycetota reflects both evolutionary history and nonrandom positional enrichment, particularly at 5\u0026prime; termini, where it may fine-tune translation timing. This positional bias likely represents a conserved mechanism for coordinating gene expression. Beyond providing biological insight, our findings highlight the practical value of codon analysis for synthetic biology, metabolic engineering, and efforts to optimize the expression of biosynthetic gene clusters.\u003c/p\u003e","manuscriptTitle":"Unraveling Rare Codon Bias in Actinomycetota: Lineage-Specific and 5’ Terminal Enrichment Across 1936 Genomes","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-19 18:15:35","doi":"10.21203/rs.3.rs-7996976/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-12-30T10:11:50+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-12-09T19:47:50+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-17T13:02:34+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"269463713370506894390282897087544555035","date":"2025-11-10T18:25:39+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"309677216951762058524320748758628675507","date":"2025-11-10T10:05:09+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"52958135785454551553958219188932224186","date":"2025-11-10T08:55:40+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-11-09T22:21:12+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-11-04T15:52:13+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-11-03T03:05:03+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-11-03T03:04:43+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Genomics","date":"2025-10-31T09:31:17+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-genomics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"gics","sideBox":"Learn more about [BMC Genomics](http://bmcgenomics.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/gics","title":"BMC Genomics","twitterHandle":"#BMCGenomics","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"f1e682a6-fb33-43d0-b529-bccc459773dd","owner":[],"postedDate":"November 19th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2026-03-09T16:00:44+00:00","versionOfRecord":{"articleIdentity":"rs-7996976","link":"https://doi.org/10.1186/s12864-026-12708-9","journal":{"identity":"bmc-genomics","isVorOnly":false,"title":"BMC Genomics"},"publishedOn":"2026-03-05 15:57:06","publishedOnDateReadable":"March 5th, 2026"},"versionCreatedAt":"2025-11-19 18:15:35","video":"","vorDoi":"10.1186/s12864-026-12708-9","vorDoiUrl":"https://doi.org/10.1186/s12864-026-12708-9","workflowStages":[]},"version":"v1","identity":"rs-7996976","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7996976","identity":"rs-7996976","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.